Refetch & validate
Refetch
A capture has three independently-attemptable dimensions. khiipd refetch lets you
re-run any one of them:
khiipd refetch 01JX9... # re-extract: runs the extractor again and creates # a new capture that supersedes the old onekhiipd refetch 01JX9... --media # re-walk the media-fetcher registry in placekhiipd refetch 01JX9... --wayback # re-submit to the Wayback Machine in placeThree dimensions
| Command | Dimension | Effect |
|---|---|---|
khiipd refetch <id> | extraction (default) | Re-runs the extractor against the original URL and writes a new capture; the old one is marked superseded (below). |
khiipd refetch <id> --media | media | Re-walks the media-fetcher registry on the existing capture in place — same id, same vault file. For retrying a media download that previously failed. |
khiipd refetch <id> --wayback | wayback | Re-submits the canonical URL to the Wayback Machine in place, updating the capture’s archive_urls. |
--media and --wayback are mutually exclusive. All three map to
POST /api/v1/captures/{id}/refetch?dimension={extraction|media|wayback}, and the dimensions
are independently re-attemptable per ADR-0010 — a failed
media fetch doesn’t force you to re-run the whole extraction.
The supersession chain
Re-extraction is append-only. Rather than overwriting the old capture, Khiip:
- Runs a fresh capture (bypassing dedup) through the full pipeline — extraction, media-fetch, Wayback, Source-tier, vault write, SQLite, embeddings.
- Sets a
superseded_bypointer on the old capture’s row, pointing at the new id.
The old capture and its vault file stay on disk, so the bitemporal history is preserved —
you can still answer “what did this say when I first captured it.” A later dedup (or a
re-capture of the same URL) resolves to the most-recent un-superseded capture — the one
whose superseded_by is empty. The CLI prints both ids on success:
✓ re-extracted; new capture id: 01JXA… old (superseded): 01JX9…The in-place dimensions (--media / --wayback) don’t create a new capture or touch the
supersession chain — they re-render the existing capture’s body and update its payload.
Validate
khiipd validatevalidate is a read-only consistency check between your Markdown vault (canonical) and
the SQLite index (a derived cache) per ADR-0009 §C3. It
never mutates either side. It checks four invariants:
- Vault ↔ SQLite reconciliation. Every index row has a matching vault file and vice
versa — matched by the
id:frontmatter key, not the filename. Catches a manual delete, a half-finished restore, or a vault file with noid:. - Payload round-trip. Every vault file’s
structured_payloaddeserializes back through the typed discriminated union — guards against schema drift between extractor versions. - Status consistency. A
successcapture has no error recorded; afailed-permanentcapture does (the failure reason is the only signal on a tombstone). Per ADR-0010. - Media sandbox. Every
media.local_pathresolves to a real file inside the vault — no out-of-vault paths, no dangling references.
It exits 0 when everything passes and 1 when there are violations, printing a
per-invariant summary:
checked 142 captures + 142 vault files✓ all invariants passAdd --json for a machine-readable report (handy in a backup-restore or CI script), or
--vault-path / --db-path to point at a non-default location.
Run it after an upgrade, after restoring a backup, or any time the vault and index might have drifted.