The actual title

The actual caption.

Date: ~1993
Place: Campbell River, BC

` after stripping ` ``` ### 6.2 Supported Asset Types | Category | MIME Types | |----------|-------------------------------------------------| | Images | `image/png`, `image/jpeg`, `image/svg+xml`, `image/webp` | | Audio | `audio/mpeg`, `audio/ogg`, `audio/wav` | | Icons | `image/svg+xml` | | Fonts | `font/woff2` (embedded via CSS `@font-face`) | ### 6.3 Size Limits | Threshold | Behavior | |-------------|------------------------------------------------------------| | < 2 MB | Normal. No warnings. | | 2 - 5 MB | Compiler warns. Author confirms. | | 5 - 15 MB | Compiler requires explicit `--allow-large` flag. | | > 15 MB | Blocked. Asset must be excluded or downsampled. | These limits apply to the **total file size** of the compiled capsule. In practice, structured JSON data is compact — size limits are almost always hit by embedded binary assets (images, audio, fonts), not by the data snapshot itself. A capsule with 100,000 JSON records and no images will likely stay well under 2 MB. ### 6.4 Fallback Behavior If an asset cannot be embedded (size, format, encoding failure), the compiler must: 1. Insert a placeholder element with `class="capsule-asset-placeholder"` 2. Include the original filename and description as text content 3. Log the exclusion in `manifest.compilation.warnings` --- ## 7. Response and Feedback Schema When a capsule supports the `export_response` capability, responses must follow this schema. ### 7.1 Response Envelope ```json { "response_schema_version": "0.1.0", "capsule_reference": { "capsule_version": "1.0.0", "uuid": "3b31cb55-9bd2-4d37-86dd-7a14ac5cbaf6", "snapshot_id": "snapshot:sn_001" }, "response": { "type": "annotation", "created_at": "2026-05-16T14:30:00Z", "created_by": "recipient", "payload": { } } } ``` ### 7.2 Response Types | Type | Payload Structure | |---------------|----------------------------------------------------------------| | `annotation` | `{ "record_id": "rec_001", "note": "...", "field": "..." }` | | `ranking` | `{ "ranked_items": [{ "record_id": "rec_001", "rank": 1 }] }` | | `selection` | `{ "selected": ["rec_001", "rec_005"], "reason": "..." }` | | `decision` | Single: `{ "decision": "approved", "conditions": "...", "notes": "..." }` — or multi-record: `{ "decisions": [{ "record_id": "rec_001", "verdict": "approve", "note": "..." }], "summary_verdict": "approved", "summary_notes": "..." }`. Per-record entries must include `record_id` and at least one of `verdict` or `note` (note-only entries are valid — they capture comments on records the recipient didn't judge). | | `feedback` | Flexible shape. Recommended fields: `rating`, `comments`, `suggestions`, `position`, `most_important_issue`, `notes`. Additional fields permitted — feedback takes many forms (rating, structured form response, multi-question survey, etc.). | | `form_data` | `{ "fields": { "field_name": "value", ... } }` | | `freeform` | `{ "content": "...", "format": "markdown" }` | | `patch` | `{ "operations": [{ "op": "replace", "path": "/records/0/title", "value": "..." }] }` — a JSON Patch ([RFC 6902](https://datatracker.ietf.org/doc/html/rfc6902)) array of operations against the capsule's data. Useful for corrections to records (genealogy, document review, data cleanup). The recipient is proposing changes, not asserting authority — the author still reviews before applying. | ### 7.3 Validation Rules The import workflow must validate: 1. `capsule_reference.uuid` is looked up in the registry. *If found:* full validation including per-record content-hash comparison for stale-response detection. *If not found:* degraded validation (envelope structure, sanitization, schema conformance only) — the user is prompted to register the referenced capsule. The registry is a trust amplifier, not a gate (see Section 11.4). 2. `capsule_reference.capsule_version` matches or is noted as outdated 3. `response.type` is a recognized type 4. `response.payload` conforms to the type's expected structure 5. No executable content in any string field (strip `

Minimal Capsule Example

About this artifact

``` --- ## Appendix E: v0.4 Candidates Items queued for the next minor revision. None are committed; each is listed with the question it answers and the lean position from current discussion. Schema and validator changes do not happen here — Appendix E is the parking lot for design decisions that need to be made before they ship. ### E.1 Remove deprecated `capsule_id` slug and `related[]` Both fields were deprecated in v0.3 with an informational validator note. v0.4 should remove them from the schema entirely. The accept-but-warn period gives existing capsules a migration window; v0.4 closes it. Action: drop the fields from `spec/manifest.schema.json`, remove the deprecation paths from `compiler/validate.py`, and update the spec field tables. ### E.2 Compiler-kind UUIDv5 carve-out **Question.** Should `generator.kind: "compiler"` capsules be allowed (or required) to use UUIDv5 (name-based, SHA-1) rather than UUIDv4 (random)? **Lean.** Yes, allow v5 for `kind: "compiler"`; keep v4 mandatory for `kind: "llm" | "human" | "hybrid"`. For deterministic producers, UUID-as-identity is a stronger contract than v4 + integrity-hash: two compilers given the same canonical inputs produce the same logical capsule and should land on the same UUID, simplifying registry deduplication and rebuild idempotency. **Namespace convention (lean: shared namespace).** Two shapes were considered: 1. **Single shared "compiled-capsule" namespace UUID + canonical input string.** One well-known constant in the spec. The input string format is normative: e.g., `|`. Cross-compiler interop is automatic — two compilers producing the same logical artifact land on the same UUID. 2. **Per-domain namespace UUIDs.** More structure; domain authors declare and stabilize their own namespace. Option 1 is simpler and the canonical-input-string discipline is the load-bearing part. The `compiler` tier already implies "deterministic inputs"; bake that into one namespace constant rather than fragmenting per domain. **Open before shipping.** (a) Pick the namespace UUID. (b) Specify the canonical input string format normatively. (c) Update schema/validator to accept v5 conditionally on `generator.kind`. (d) Decide whether v5 is *allowed* or *required* for compiler-kind. ### E.3 Reconsider `ai_usage_guidance` in domain capsules **Question.** Should `domain.implementation_notes` and `domain.design_system` keep the `ai_usage_guidance` field (`allowed_tasks`, `restricted_tasks`, `preferred_language`)? **Lean.** Demote or remove. The field is editorial intent dressed up as structured metadata. Consumers — human or AI — can't enforce `restricted_tasks`, and `preferred_language` is style guidance that belongs in `description` prose. Existing schema fields (`description`, `caveats`) already carry editorial concerns. Adding more schema slots for "AI should/shouldn't" invites producers to encode wishes as contracts. **Options.** (a) Cut the field from both domain schemas. (b) Move it under the `x-` extension prefix so consumers treat it as opt-in vendor metadata, not a load-bearing standard field. (c) Keep but document explicitly as advisory/non-enforceable. ### E.4 Hash-algorithm flexibility The schema's hash pattern already accepts `sha384` and `sha512`, but the validator and reference compiler only emit/verify `sha256`. v0.4 candidate: either implement `sha384`/`sha512` end-to-end, or restrict the pattern to `sha256` until there's a concrete use case for the longer digests. Lean: restrict the pattern. Premature flexibility is more confusing than additive. ### E.5 Rule 12 vs. legacy compiler templates **Question.** Rule 12 (added in Core v0.1.3) says readable content should be pre-rendered in the HTML, not produced by runtime JavaScript. The reference compiler templates `templates/decision_board` and `templates/news_capsule` predate this rule and still render primary content via the runtime. The validator's heuristic passes them because it counts *surrounding* static UI text (headings, buttons, labels) and finds enough, but the *primary* content (the decision options, the article body) is still injected by JS at load time. **Two paths.** 1. **Tighten Rule 12 enforcement.** Validator measures specifically the *data-bearing* content (records rendered into the DOM), not all visible text. Current compiler templates would fail; we'd fix them by emitting pre-rendered record markup at build time (the data block stays the source of truth; the rendered DOM mirrors it). This is the principled move and matches what LLM-produced capsules in the corpus already do. 2. **Accept the templates as legacy.** They're v0.1 artifacts; the format has moved past them. The compiler-kind generator-of-record going forward is Mintel-style build scripts (which DO pre-render). Stop using the reference templates as the canonical compile-path examples and document them as historical. **Lean: option 1** for principled consistency, but acknowledge it requires real template rework. If option 2 wins, we should update Phase 2 status to reflect that the "compiler" is a *category* of producer (any deterministic build script) rather than these specific templates. **Open before shipping.** (a) Decide whether to tighten or to retire. (b) If tighten: design the validator check that measures data-bearing content specifically. (c) If retire: write the deprecation note in `templates/README` and ensure the corpus index still works without them as references. ### E.6 Author signing + transparency log (Sigstore-shaped) **Question.** How does a recipient verify that a capsule they received hasn't been silently tampered with by someone in the forwarding chain? A capsule's UUID asserts "this is identifier X," but UUIDs are not enforced — anyone can ship a modified capsule under the same UUID. The current `integrity.content_hash` detects tampering only if the recipient knows what hash to expect, which they typically don't. **The unanswered trust question.** Current spec answers *what is this?* and *where does it claim to come from?*. It does not answer *did the claimed author actually publish these exact bytes?*. The current trust signals are honest about what they prove — but the missing question is the one a forwarded-capsule recipient most wants answered. **Design sketch (not committed).** Three trust tiers, layered: | Tier | Meaning | |---|---| | **Self-describing** | UUID + manifest + content_hash, no external proof. Adequate for personal archives. This is the v0.3 baseline; no change required. | | **Signed** | `content_hash` (and `file_hash`, see below) signed by an author key, identity-anchored via Sigstore/Fulcio-style OIDC issuance. Detects tampering if you trust the issuing CA. | | **Logged** | Signed release recorded in an append-only public transparency log (Sigstore/Rekor shape). Detects tampering, backdating, and same-UUID-different-content games across the forwarding chain. | **Two-hash split.** Two hashes serve two different questions: - `content_hash` — canonical(manifest) + LF + canonical(data), already specified in §9.1.1. Survives DOM round-trip (the JSON blocks are raw-text script-tag content browsers don't normalize). Answers: *is the meaningful payload intact?* - `file_hash` — SHA-256 of the raw `.html` file bytes (with the hash field placeholder-substituted per the existing recipe — same protocol as `hash_scope: "full_document"`). Does NOT survive DOM round-trip. Answers: *is this byte-identical to the file the author originally published?* The current `hash_scope` enum collapses these two questions into one choice per integrity block. The two-hash split would let a capsule carry both simultaneously and resolve the tension we already documented in §5.1.1 between `download_capsule` and `full_document` integrity. The integrity block would grow to `{ content_hash, file_hash?, hash_scope, signature?, log_entry_uuid? }`. **Verification model: out-of-band, capsule stays mute.** - The capsule itself never calls home. **Rule 2 (no network) is preserved at render time.** This is non-negotiable. - Every capsule already embeds a QR code encoding `urn:uuid:` in the header (Core spec convention; see CAPSULE_CORE.md rule 4 supplementary guidance). - A verifier app on the recipient's phone or reader scans the QR, resolves the UUID to a verification URL, and queries the transparency log. The app reports: **verified** / **modified** / **unknown** / **superseded** / **identity warning**. - Verification is explicitly opt-in friction. The capsule opens and works without it. The recipient who cares about provenance takes an active step; the recipient who doesn't is unaffected. This trades a small amount of UX friction for the preservation of the no-network guarantee. The alternative — capsule auto-verifies on open — would break Rule 2 and turn every recipient view into a network call against a log operator. Not worth it. **Precedents to compose, not duplicate:** - **Sigstore / Rekor** — the model. Append-only transparency log for signed software artifacts, identity-anchored via OIDC issuance through Fulcio. The "Logged" tier would record signed-release statements about specific UUIDs against this existing infrastructure rather than running our own log. Reference: . - **C2PA / Content Credentials** — relevant trust-model patterns for signed content provenance, especially around metadata-stripping defense. Reference: . **Hard problems to resolve before any shipping decision:** 1. **Author identity.** "Signed by author key" only works if recipients know which key to trust. Without OIDC/Fulcio-style identity issuance, the modifier signs their tampered version with their own key and the verifier sees "signed by some key, who knows whose." This is the entire trust model, not a small implementation detail. 2. **Log operator and governance.** Sigstore is operated by the Linux Foundation. For Capsules, the strong move is to *compose* existing infrastructure (record arbitrary signed JSON statements against existing Rekor) rather than run our own log. Investigate whether Rekor accepts non-software artifact statements before committing. 3. **Two-hash compatibility.** Adding `file_hash` is additive but changes the `hash_scope` semantics. Decide whether `hash_scope: "full_document"` becomes redundant (replaced by always-present `file_hash`) or stays as a third explicit choice. 4. **Empirical pressure.** No real-world tampering incident has been reported in the capsule corpus. Per the spec-gravity discipline: wait for empirical signal before building infrastructure. **When this earns a v0.4+ schema slot.** Design and ship in a single coordinated patch when *any* of: 1. A real-world capsule-tampering incident is reported (corpus or independent producer); 2. An independent producer or recipient requests verification primitives concretely; 3. A practical Sigstore composition path appears that meaningfully reduces the infrastructure cost. Until then, this entry is the design memory. ### E.7 Password-protected encrypted capsules **Question.** Should the format support encrypting a capsule's content with a password so that only recipients with the password can read it? The motivating use case is sensitive personal data, client confidentiality, or selective sharing where the existing redaction primitive isn't sufficient. **Lean: don't build, advise wrappers.** The format already has the right primitive for "don't share this content" — `privacy.redaction_applied` with `redaction_method` and `redaction_profile`. The intended model is: **decide what's shareable before sealing, redact what isn't, then seal.** Encryption pulls capsules toward "selective-access messaging," which is a different problem space better served by: - **OS/wrapper-level encryption** (AES-encrypted ZIP, `.age`-encrypted wrappers, `gpg`-encrypted files) around the capsule. Recipient unlocks the wrapper, opens the capsule. The capsule itself stays pure and validator-clean. - **Hosting-platform auth gates** (per the MinDev pattern in §11 hosting discussion). The platform controls *delivery*; the capsule itself doesn't gate its internal contents. - **Authenticated channels** for transport (Signal, encrypted email, password-protected file storage). The capsule travels through the secure channel; the format itself stays neutral. **Why building encryption into the format is worse than these alternatives:** 1. **Cryptography is unusually hard to ship in a spec.** Browser-native primitives (WebCrypto: PBKDF2 + AES-GCM) work, but the recipe surface is large — iteration counts, salt and IV handling, authentication-tag verification, side-channel exposure. Compare the integrity-hash recipe in §9.1.1 — that was substantial work for a much simpler primitive. An encryption recipe is roughly 10× the surface and the consequences of bugs are confidentiality breaches, not hash mismatches. Once shipped, the spec inherits a permanent maintenance obligation: when PBKDF2 iteration counts age, when GCM nonce reuse turns out to be exploitable in some browser, the spec has to update and every existing capsule with old parameters becomes ambiguously secure. 2. **Encryption breaks the format's core trust signals for the encrypted portion.** Until decrypted, encrypted content is neither human-readable, machine-readable, nor validator-checkable. Rule 12 (content pre-rendered in HTML) can't apply to the encrypted payload. `content_hash` would have to either hash ciphertext (proving nothing about meaning) or require post-decryption verification (impossible at validation time). The "memory object" framing — human-readable + machine-readable + provenance-bearing in one object — partially evaporates. The capsule has two states: **locked** (none of the three) and **unlocked** (all three). 3. **Password-only encryption is fundamentally weak.** Without a key-derivation infrastructure or hardware-backed keys, password strength is the entire security boundary. Lost passwords mean permanent data loss with no recovery path. This is the right behavior cryptographically but the wrong UX expectation most users have. 4. **No empirical pressure.** Same spec-gravity discipline as E.6. No reported case of "I needed selective access and the redaction primitive wasn't enough." The default path — redact, or wrap with OS-level encryption — has been adequate. **Design sketch if this ever ships (memory only, not committed).** Two-tier structure preserving identity in the clear while gating content: - **Public manifest** carries identity and provenance unencrypted: `uuid`, `title`, `description`, `type`, `created_at`, `generator`, `source`, `privacy`, `capabilities`. Plus a new `encryption` block: ```json "encryption": { "algorithm": "AES-256-GCM", "kdf": "PBKDF2-SHA256", "kdf_iterations": 600000, "salt": "", "iv": "", "encrypted_blocks": ["capsule-data", "capsule-runtime"] } ``` - **Encrypted payload**: each block listed in `encrypted_blocks` is base64-encoded ciphertext in place of its normal raw-text content. A small "unlock" runtime prompts for password, derives the key via PBKDF2, decrypts each block, parses, and replaces document content. - **New required capability**: `password_protected`. Declared in `capabilities`. Honest about what the capsule does before unlock. **Trust signals that survive encryption:** - Identity (UUID, title, type) — still verifiable - Provenance (generator, source) — still readable - Capabilities — still declared - E.6 signing/log — could sign the *full* artifact (envelope + ciphertext), so recipients can verify the encrypted bytes came from the claimed author *before* attempting decryption **Trust signals that don't:** - `content_hash` over data+manifest — meaningless (the data is opaque to the validator) - Rule 12 pre-rendered content check — can't apply - Capability-marker heuristics — can't see runtime code **Hard problems to resolve before any shipping decision:** 1. **Crypto parameters as a versioning problem.** PBKDF2 iteration counts that are appropriate in 2026 will be inadequate in 2030. The spec needs a version field on the encryption block and a deprecation policy. This is real maintenance cost. 2. **Recovery story.** Lost passwords = permanent data loss. Capsules that compose with key-escrow or split-secret schemes would help but multiply the complexity. 3. **Interaction with E.6 signing/log.** Sign before or after encryption? Sign envelope-only, or include ciphertext in the signed scope? Decide before either lands. 4. **What "capabilities" means under encryption.** A capsule that declares `download_capsule` can't actually implement the button until after unlock. Honesty-of-capabilities (Rule 7) gets murky. 5. **Empirical pressure.** Same as E.6 — wait for a reported case where redaction-plus-channel-security isn't sufficient. **When this earns a v0.4+ schema slot.** Design and ship when *all three* of: 1. A producer or recipient reports a concrete case where redaction + OS-level encryption + authenticated-channel delivery is insufficient; 2. The interaction with E.6 (signing/log) is resolved in design first — encryption and signing have to compose cleanly; 3. A practical answer to the parameter-versioning maintenance burden exists (probably: defer to an external standard like JWE or COSE rather than maintain our own crypto recipe). Until then: **advise users that the right answer to "I want to share this with only one person" is to encrypt the wrapper, not the capsule.** ### E.8 Validator: distinguish resource-loading `` tags from metadata-only ones **Issue.** The current `check_no_external_references` validator pattern flags **any** `` tag with a non-`data:` URI as an external resource violation: ```python (r']+\bhref=["\']\s*(?!data:)[^"\']', 'External reference (capsule CSS must be inlined)'), ``` The comment ("capsule CSS must be inlined") reveals the intent: catch external stylesheet imports. But the regex is too broad — it also flags **metadata-only** `` tags that don't load any resource: - `` — SEO canonical hint, no resource loaded - `` — alternate-form discovery, no resource loaded - `` / `` — pagination hints, no resource loaded These are pure metadata declarations the browser doesn't fetch. Rule 2 ("no network") doesn't apply to them. **The right shape.** Refine the check to distinguish resource-loading rel values from metadata-only ones: - **Resource-loading (should flag external):** `stylesheet`, `preload`, `prefetch`, `preconnect`, `dns-prefetch`, `modulepreload`, `icon` (when not a `data:` URI) - **Metadata-only (should NOT flag):** `canonical`, `alternate`, `prev`, `next`, `author`, `license`, `help`, `bookmark`, `pingback` **Implementation sketch:** ```python def check_external_link_tags(html_scannable): """Flag tags that load external resources. Metadata-only rel values (canonical, alternate, prev/next, etc.) are allowed because they don't fetch.""" LOADING_RELS = {"stylesheet", "preload", "prefetch", "preconnect", "dns-prefetch", "modulepreload", "icon"} findings = [] for m in re.finditer(r']+)>', html_scannable, re.IGNORECASE): attrs = m.group(1) href_match = re.search(r'\bhref\s*=\s*["\']([^"\']+)["\']', attrs, re.IGNORECASE) if not href_match or href_match.group(1).startswith("data:"): continue rel_match = re.search(r'\brel\s*=\s*["\']([^"\']+)["\']', attrs, re.IGNORECASE) rel_values = (rel_match.group(1).lower().split() if rel_match else ["stylesheet"]) if any(r in LOADING_RELS for r in rel_values): findings.append(f'External ') return findings ``` **Why it's not built yet.** Empirically discovered while wiring `htmlcapsule.org` as the canonical home for the spec: adding `` to a valid capsule made the validator fail it. The canonical link tag was removed for now to keep the validator green. The data-block `links.canonical` field carries the same information machine-readably. **When this earns a v0.4 schema slot.** Whenever the validator gets its next focused pass — this is a small, contained change that doesn't affect any other rule. Could ship as a v0.3.x validator patch if not bundled with anything else. Low risk: no existing valid capsule uses `` (it would already have failed), so the refinement is purely additive — same capsules still pass, plus canonical-link-bearing ones start passing too.