HTML Capsule

Observation 1

Observation 1: HTML is becoming the default output format for AI-assisted work.

Markdown has become the dominant file format used by agents to communicate with us… But as agents have become more and more powerful, I have felt that markdown has become a restricting format… I've started preferring HTML as an output format instead of Markdown and increasingly see this being used by others on the Claude Code team.

— Thariq Shihipar · @trq212

The same direction shows up in Andrej Karpathy's public prompting notes (archive) — “structure your response as HTML” as a default ask rather than a power-user trick. Where it surfaces matters: not in HTML-the-document-format threads, but in how to get useful output from an LLM.

Question 1

Question 1: How do you get HTML out of an LLM?

Two answers in the wild. They produce very different artifacts.

Answer 1a

The common-knowledge answer

Answer 1a: Just ask the LLM for HTML.

It's the default move. Karpathy recommends it as a prompting tactic; Thariq uses it on the Claude Code team (both in Observation 1). The LLM produces HTML, the bytes render in a browser, you can read what's there.

What you don't get: any way for the file to say what it is. No manifest, no UUID, no integrity, no declared capabilities, no contract on what's inside. The file can't tell the next reader — human or LLM — what tool made it, what version it represents, or whether anyone has touched its bytes. Fine for "show me what you mean"; lossy for everything else.

Answer 1b

The structured answer

Answer 1b: Ask for a Capsule.

The Core spec is one page — twelve numbered rules, designed to paste into an LLM prompt. What comes back is a sealed .html file with all of this baked in:

Manifest — inline JSON declaring what the artifact is, who produced it, what it claims to be.
Integrity hash over data + manifest — tampering is detectable.
UUID + parents[] — stable identity, explicit lineage.
Declared capabilities (Rule 7) — export, copy, download all guaranteed-implemented.
Pre-rendered content (Rule 12) — the file still works with JavaScript disabled.
No network dependencies (Rule 2) — every byte the file needs is inline.

The producer (LLM) becomes a first-class spec-aware participant — not "make some HTML" but "produce a Capsule per Core v0.3.0."

Core (one page) · full spec · worked examples.

Question 2

Question 2: Once you have the HTML, what do you do with it?

Three answers in increasing order of investment — view it, host it, build your own producer.

Answer 2a

View it

Answer 2a: Open it in a browser. Or upload to S3.

How do I view the HTML file? I tend just open it in a browser locally (you can ask Claude to open it), or upload to S3 if you want a shareable link.

— Thariq Shihipar · @trq212 · FAQ

The default. The bytes open in any browser, no setup. Caveat: no shareable URL out of the box — you bring your own hosting.

Answer 2b

Host it

Answer 2b: Host it on htmlbin.dev.

API for agents to share HTML. Agent-native, end to end. Drop HTML — get a public URL.

— htmlbin.dev positioning · Utkarsh Sengar (@utsengar)

Format and host are different jobs.

Multiple hosting projects have independently converged on the same minimal shape — drop HTML, get a URL — and stayed out of the format-discipline business. htmlbin.dev launched May 2026, the same week as htmlcapsule. The two layers compose: a valid Capsule can be served by htmlbin or self-hosted, and the format stays hosting-agnostic. Architecture + positioning archive · format/host split documented in spec.

Answer 2c

Build your own producer · maintainer's case study

Answer 2c: Build your own producer.

When I'm building tooling for my own work — geospatial maps and reports for clients and colleagues — I've designed the map-making tool to emit HTML conforming to the Capsule spec. Clients get exactly what they need: standalone (no database to maintain), portable, offline-capable, archive-permanent. And because it's HTML, the artifact stays AI/LLM-friendly downstream.

File sizes are larger than equivalent server-backed reports. For my use case the tradeoff is worth it: no database to maintain, the artifact survives independent of me, the recipient opens it without setup.

— Luke Schuss · maintainer · Vancouver · lukeschuss.com

Read the spec → See examples

Observation 2

Observation 2: File over app still holds.

The principle, from Steph Ango (Kepano):

If you want to create digital artifacts that last, they must be files you can control, in formats that are easy to retrieve and read.

— Steph Ango (@kepano) · "File over app"

The Obsidian community drifted the other way. When you need a plugin stack — third-party or otherwise — to get the layout, look, and feel you want, the file isn't really doing the work anymore. You're straight back at app over file.

Observation 3

Observation 3: HTML diffs are noisy. Lineage is the bigger problem.

The diff problem, from Thariq again:

What about version control? This is honestly one of the biggest downsides of HTML — HTML diffs are noisy and hard to review compared to Markdown.

— Thariq Shihipar · @trq212 · FAQ

True — raw HTML diffs are noisy because HTML is closer to a runtime artifact than a writing format. A small visual change can move wrappers, attribute orders, inline styles, scripts, or embedded assets in ways that make a normal Git diff look much larger than the real change. But raw diffs are only one layer. There are three separate questions:

Raw diff: what changed in the HTML bytes?
Semantic diff: what changed in the artifact's content, structure, style, data, assets, or runtime behavior?
Lineage: what artifact is this, what version is it, where did it come from, and what does it replace?

Lineage is the most basic of the three — and the one people most often lose. Chat-hosted artifacts make it worse: Claude's artifacts and ChatGPT's canvas mutate in place as you iterate, with no natural versioning anchor. You can fork to a new conversation, but the new one starts fresh with no explicit "this is v3" link back. The chat keeps going; the artifact loses its history.

Capsule doesn't claim raw HTML diffs become pleasant. It claims artifacts need identity and lineage first. Once an artifact has a manifest, UUID, version, parents, timestamps, and integrity hashes, tools can build cleaner review surfaces around the HTML instead of forcing humans to read the HTML directly. Identity and lineage are addressable — and they unlock the rest.

Question 3

Question 3: How do you know which version of an artifact you're looking at?

Three answers, weakest to strongest.

Answer 3a

Filename versioning

Answer 3a: `v1.html`, `v2.html`, `v3-final.html`, `v3-real-final.html`.

The default. The version lives outside the file — in the filename, the email subject, the folder name. People work this way because they have to.

The cost: nothing inside the file knows about its own version. When the file is renamed, forwarded, copy-pasted, or unzipped from an archive, the version information rides with the filename. If the filename gets dropped — or "cleaned up" — the lineage breaks. The artifact can't speak for itself.

Answer 3b

Identity + lineage, recorded inside the file

Answer 3b: `uuid`, `capsule_version`, `parents[]` — in the manifest.

Each Capsule carries three identity fields inside its own inline JSON manifest:

uuid — a stable identifier for this file (immutable; mint once, never change).
capsule_version — a version string the producer maintains (semver-friendly; bumped on each meaningful change).
parents[] — the UUIDs and titles of capsules this one was forked or derived from.

So the artifact knows what it is and what it descended from, regardless of filename. Paste a Capsule into a new editing session, and a capsule-aware producer records the parent UUID in its own parents[] array. The lineage chain stays intact across renames, forwards, and zip files — because identity lives in the bytes, not the filename.

This solves which version is this and where did it come from. It doesn't yet solve the HTML-diffs problem proper — for that, see Answer 3c on the review surfaces that can be built on top of this identity foundation. Full spec · Core (one page).

Git tracks files. Capsule tracks artifacts.

Answer 3c

Capsule-aware diff · open work

Answer 3c: The HTML is the artifact. The manifest is the review surface.

Raw HTML diffs are noisy because the file is dense with runtime detail. But a Capsule has five required blocks, each addressable on its own — and a capsule-diff tool layered on the spec doesn't need to make humans read raw HTML at all. It compares each block separately and presents deltas at the right grain:

Review layer	What it compares	Why it helps
Raw HTML diff	Source bytes	Complete, but noisy
Manifest diff	UUIDs, versions, parents, integrity, declared metadata	High-signal artifact review
Content diff	Extracted text / content regions	Human-readable editorial review
Data diff	Embedded JSON, tables, datasets, structured payloads	Shows factual / data changes
Style diff	CSS blocks, tokens, layout declarations	Separates visual changes from content changes
Asset diff	Images, fonts, media, embedded files by hash	Shows whether bundled materials changed
Runtime diff	Scripts, permissions, interaction logic	Flags behavior changes
Render diff	Screenshot or DOM snapshot before/after	Shows actual visual impact

Output from such a tool might look like:

Capsule diff: a.html → b.html · Manifest: capsule_version 1.0.0 → 1.1.0; +1 parent · Data: 3 entries added, 1 modified · Content: +120 chars of visible text · Style: no change · Runtime: no change · Integrity: content_hash changed (expected; manifest re-stamped)

Not built yet. Parked in spec Appendix E until someone hits a real diff-review problem worth solving. When it materializes, the diff tool sits as a layer on the spec, not a spec change — the spec's existing structural discipline is exactly the foundation that makes a semantic diff tractable.

Observation 4

Observation 4: A live-editing and collaboration layer is emerging around AI-emitted HTML.

Three projects, three different shapes of "editor":

Claude Design (Anthropic Labs, launched April 17, 2026) — Claude generates production-ready HTML / CSS / JS from natural-language descriptions inside a chat-and-preview interface. Renders in a preview pane alongside the conversation; standalone HTML is exportable. Single-user iteration loop (you + Claude).

html-docs.com (Raunaq Bhutoria) — agents publish HTML pages; humans review with inline comments; agents read the comments and revise. The async agent ↔ human review loop named as a workflow primitive.

Workplane (Matan / matanrak) — same shape as html-docs; MCP-first; multi-agent (Claude Code, Codex, Cursor, Devin, Claude Desktop). Open-source agent skill at work-plane/workplane-skills (MIT). Independent convergence with html-docs.com on the same publish → comment → revise loop — F22 in the research log.

Capsule sits downstream of all three. It's the seal step, not the edit step. The format is editor-agnostic: a Capsule can swallow output from Claude Design, from html-docs, from Workplane, from a hand-written compiler script — all the same envelope shape. The producer kind (llm / compiler / hybrid / human) is declared in the manifest; the editor that produced it is otherwise invisible.

Capsule deliberately stays out of the editor business — same way it stays out of the host business. A registry like MinDev could plausibly host lightweight commenting on hosted Capsules (read-only annotations alongside the bytes, not co-editing of the bytes), but that's a registry concern, not a format concern.

Observation 5

Observation 5: Share links don't compose across LLM platforms.

Share links exist on every chat platform — ChatGPT, Claude, Gemini, the rest. They work for viewing: anyone with the URL can read the conversation in the platform that minted it. But the conversation stays locked to the host. Formatting, navigation, the ability to continue the thread — all of it lives inside whichever app made the link.

Hand a ChatGPT share link to Claude (or the reverse). It doesn't compose: no structured conversation, no message boundaries, no thread state, no way to pick up where the other one left off. Share links are read-only viewers, not portable artifacts. A Capsule, by contrast, opens the same in any browser and can be handed to any LLM with full context preserved — because the file is the artifact. That's the practical difference between "I shared a link" and "I shared a file."

01 / 09

Discipline · research-led

A discipline for HTML artifacts.

AI tools emit HTML by default. Capsule is the contract that makes those artifacts durable, inspectable, and re-readable years from now.

Read the spec → View on GitHub

12rules

5required blocks

0network deps

02 / 09

Outcome-led

Sealed HTML. Built to last.

One file. No network. Built to be re-opened years from now — by humans, and by the LLMs they ask to continue the work.

Read the spec →

03 / 09

Substrate · question

HTML is the substrate. Now what?

An open spec for what to do with the HTML that AI tools already give you — so it survives the chat it was born in.

Read the spec →

04 / 09

Contract · interop-first

A contract for HTML artifacts.

Manifest. Provenance. Integrity. No network. The same envelope whether an LLM, a build script, or a human writes it.

Read the spec → Full spec

05 / 09

Max-short · benefit-led

HTML you can keep.

Sealed. Self-contained. Provenance-bearing. One file you can email, archive, or fork.

Read the spec →

06 / 09

Comparison · familiar anchor

Like a PDF, but inspectable.

PDFs are sealed and closed. Capsules are sealed and open — programmable, inspectable, re-renderable. The HTML profile for work worth preserving.

Read the spec →

07 / 09

Visual-led · brand-first

A profile of HTML for sealed, self-contained artifacts. One file. Twelve rules. Zero network.

Read the spec →

08 / 09

Problem · why-this-exists

Where does AI's HTML go?

Capsule is the discipline that makes AI-generated HTML durable, inspectable, and shareable beyond the platform that produced it.

Read the spec →

09 / 09

Context-led · plain-English on-ramp

HTML you can keep.

AI tools increasingly produce HTML: reports, maps, demos, dashboards, visual notes, and interactive documents.

HTML Capsule is an open spec for saving that work as one self-contained .html file. A Capsule includes its own content, styles, metadata, provenance, and validation rules, with no required network access.

It is not a new file format. It is a stricter way to package HTML so the work can be archived, shared, inspected, and continued later — by humans, tools, and LLMs.

One file. No network. Built to last.

Read the spec → View examples

Observation 1: HTML is becoming the default output format for AI-assisted work.

Question 1: How do you get HTML out of an LLM?

Answer 1a: Just ask the LLM for HTML.

Answer 1b: Ask for a Capsule.

Question 2: Once you have the HTML, what do you do with it?

Answer 2a: Open it in a browser. Or upload to S3.

Answer 2b: Host it on htmlbin.dev.

Answer 2c: Build your own producer.

Observation 2: File over app still holds.

Observation 3: HTML diffs are noisy. Lineage is the bigger problem.

Question 3: How do you know which version of an artifact you're looking at?

Answer 3a: v1.html, v2.html, v3-final.html, v3-real-final.html.

Answer 3b: uuid, capsule_version, parents[] — in the manifest.

Answer 3c: The HTML is the artifact. The manifest is the review surface.

Observation 4: A live-editing and collaboration layer is emerging around AI-emitted HTML.

Observation 5: Share links don't compose across LLM platforms.

A discipline for HTML artifacts.

Sealed HTML. Built to last.

HTML is the substrate. Now what?

A contract for HTML artifacts.

HTML you can keep.

Like a PDF, but inspectable.

HTML Capsule

Where does AI's HTML go?

HTML you can keep.

Manifest

Answer 3a: `v1.html`, `v2.html`, `v3-final.html`, `v3-real-final.html`.

Answer 3b: `uuid`, `capsule_version`, `parents[]` — in the manifest.