Installation
Requirements
- Python 3.10+ (the substrate is tested on CPython; the bundled embedder ships as ONNX)
- ~80MB free for the one-time MiniLM-L6 model fetch on first
khiipd serve - Optional per-source credentials (see below) for sources that require your identity
Install from source
git clone https://github.com/KhiipAI/khiip.git ~/projects/khiipcd ~/projects/khiippip install -e ".[dev]"This installs the khiipd CLI and the khiip-mcp-server console script.
Default paths
| Purpose | Path | Override |
|---|---|---|
| Config | ~/.config/khiip/ | XDG_CONFIG_HOME |
| Data (SQLite index) | ~/.local/share/khiip/ | XDG_DATA_HOME |
| Vault (Markdown captures) | ~/khiip-vault/ | [daemon] vault_path |
See Configuration for the full picture, including the
KHIIP_HOME test-only knob and why you should not set it in your shell rc.
Per-source credentials
Khiip plumbs between your upstream accounts and your local vault — it never holds upstream credentials of its own. Most sources work with no setup (X via fxtwitter, generic web via trafilatura, Wikipedia, Reddit via old.reddit HTML). A few accept your own platform credentials so requests go through your identity and your rate-limit budget for higher fidelity.
Reddit (optional — capture works credential-free)
Reddit captures work out of the box via the credential-free old.reddit.com HTML channel (live threads + full comment trees, self-hosted on your own IP). Configuring your own Reddit app is an optional upgrade: it adds cleaner deep-comment pagination plus a 60 req/min headroom and is tried first when present.
-
Visit https://www.reddit.com/prefs/apps → create another app… → choose script type → name it (e.g.
khiip), set redirect urihttp://localhost:8478/(unused but required by the form) → create app. -
Copy the client ID (the short string under the app name) and client secret.
-
Add them to
~/.config/khiip/config.toml:[extractors.reddit]client_id = "your-reddit-client-id"client_secret = "your-reddit-client-secret"…or set
KHIIP_REDDIT_CLIENT_IDandKHIIP_REDDIT_CLIENT_SECRETbeforekhiipd serve.
If a configured app is missing or expired, capture transparently falls back to the
HTML channel rather than failing. The HTML channel is self-host-only by design (a
datacenter IP would hit Reddit’s WAF and robots.txt).
YouTube (optional; widens the fallback chain)
Set a free Google Data API v3 key to add a third fallback source for metadata resilience when yt-dlp and youtube-transcript-api both fail:
[extractors.youtube]api_key = "your-google-data-api-v3-key"Verify
khiipd versionkhiipd serve &khiipd capture https://en.wikipedia.org/wiki/Quipukhiipd recall "Inca knot record system"If the capture lands in ~/khiip-vault/ and recall returns it, you’re set. Next:
capture guides or wire up the MCP server.