Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusDocumentation
Edited today746 words4 backlinksUpdated bimonthlyDue in 9 weeks
44QualityAdequate •10.5ImportancePeripheral9ResearchMinimal
Summary

Documentation for the wiki's multi-layer data storage architecture. The wiki-server PostgreSQL database stores citation content, audit results, claims, facts, and structured data accessed via Hono RPC API. An in-memory LRU cache provides session-local caching for source fetching. YAML files in data/ define entities, facts, and resources. The build pipeline compiles YAML + MDX into database.json for the Next.js frontend.

Content5/13
LLM summaryScheduleEntityEdit history2Overview
Tables6/ ~3Diagrams3Int. links5/ ~6Ext. links0/ ~4Footnotes0/ ~2References0/ ~2Quotes0Accuracy0RatingsN:0 R:6.5 A:7 C:8Backlinks4
Change History2
Add PageStatus and info boxes to internal pages#1853 weeks ago

Enabled PageStatus rendering, Data links, and Feedback widgets on internal pages by removing the `isInternal` guards in the wiki page renderer. Added `evergreen`, `update_frequency`, and `lastEdited` frontmatter to all ~40 internal pages so update schedules and staleness indicators are visible.

Internal pages entity infrastructure#1424 weeks ago

Added full entity infrastructure to internal pages (style guides, architecture docs, research reports, schema docs). Internal pages now have the `internal` entity type, get auto-assigned E* numeric IDs (E698-E731), are included in the search index, and participate in backlinks/related graph computation. Includes review fixes: filtering internal pages from public explore/home, converting all 7 remaining .md files, adding `internal` to data/schema.ts, and updating all `shouldSkipValidation`/`pageType === 'documentation'` checks.

Issues1
QualityRated 44 but structure suggests 67 (underrated by 23 points)

Content Database System

The wiki uses a multi-layer storage architecture. There is no single database — different kinds of data live in the storage layer best suited for them.


Storage Layers

Loading diagram...

1. PostgreSQL (wiki-server)

The wiki-server runs a PostgreSQL database that stores all structured data requiring durability and cross-machine access. This replaced the earlier local SQLite database (.cache/knowledge.db), which was retired in February 2026.

What it stores:

TablePurpose
citation_contentFull text of fetched source URLs (for quote verification)
citation_auditsPer-page citation verification results
claimsExtracted atomic claims with source references
factsCanonical facts with values and computed expressions
resourcesExternal resource metadata (papers, blogs, reports)
entitiesEntity metadata synced from YAML
agent_sessionsClaude Code session logs
edit_logsPer-page edit history
hallucination_evalsHallucination detection results

Access pattern: All access goes through the wiki-server's Hono RPC API. CLI tools use apiRequest() from crux/lib/wiki-server/. The frontend uses typed RPC clients (e.g., getFactsRpcClient()).

# Example CLI commands that read/write PostgreSQL
pnpm crux citations verify <page-id>    # Verify citations → writes audit results
pnpm crux query entity <id>             # Read entity data
pnpm crux query search "topic"          # Full-text search

2. In-Memory LRU Cache

Source fetching uses a session-scoped in-memory cache (crux/lib/citation-content-cache.ts) to avoid redundant network requests and database lookups within a single process.

PropertyValue
Max entries500
EvictionLeast Recently Used
ScopePer-process (cleared on exit)
PersistenceNone — purely ephemeral

When fetching a URL, the system checks:

  1. In-memory LRU cache (fastest)
  2. PostgreSQL citation_content table (durable)
  3. Network fetch via Firecrawl or built-in fallback (slowest)

Results are written back to both the LRU cache and PostgreSQL.

3. KB YAML (packages/kb/)

The Knowledge Base package (packages/kb/) is the authoritative source for structured entity facts — valuations, revenue, headcounts, founding dates, and other typed properties. As of March 2026, 9+ entities have been migrated here from the older data/facts/ system.

PathContent
packages/kb/data/things/*.yamlEntity facts with typed properties, time series, sources
packages/kb/data/schemas/Property schemas (60 properties across orgs, people, AI models, etc.)

KB facts are rendered on wiki pages via <KBF> and <KBFactValue> components, and computed values via <Calc>. See Data System Authority Rules for which system is authoritative for which entities.

4. YAML Files (data/)

Human-editable YAML files are the source of truth for content metadata:

DirectoryContent
data/entities/Entity definitions (type, description, relations)
data/facts/Legacy facts (deprecated for entities migrated to KB)
data/resources/External resource metadata
data/graphs/Cause-effect graph data
data/edit-logs/Per-page edit history
data/citation-archive/Per-page citation verification YAML
data/auto-update/Auto-update system configuration and state

YAML files are checked into git and are the canonical source for everything they contain. PostgreSQL mirrors some of this data for API access and full-text search.

5. File-System Caches (.cache/)

Temporary files for local development workflows:

PathPurpose
.cache/sources/Fetched source documents (HTML, text, PDF)
.cache/content-hashes.jsonMD5 hashes for change detection during scans

These are gitignored and can be deleted without data loss.

6. Build Artifact (database.json)

The build pipeline (apps/web/scripts/build-data.mjs) compiles YAML + MDX frontmatter into apps/web/src/data/database.json. This single JSON file contains all entities, pages, relations, facts, search data, and statistics needed by the Next.js frontend.

pnpm build-data           # Full build (~2 min)
pnpm build-data:content   # Content-only rebuild (~15s)

The JSON is loaded at server startup with lazy-built indexes (see Architecture).


Data Flow

Loading diagram...

Source Fetching Flow

When verifying citations or fetching content for page improvement:

Loading diagram...

CLI Commands

CommandPurpose
pnpm crux citations verify <page-id>Verify all citations on a page
pnpm crux citations auditRun citation audits across pages
pnpm crux scan-contentScan MDX files for content analysis
pnpm crux query search "topic"Full-text search via wiki-server
pnpm crux query entity <id>Look up entity data
pnpm crux query related <id>Find related pages
pnpm crux context for-page <id>Full research context for a page
pnpm build-dataRebuild database.json from YAML + MDX
pnpm build-data:contentContent-only rebuild (≈15s)

Limitations

  1. No offline PostgreSQL access: CLI commands that query the wiki-server require network connectivity
  2. LRU cache is session-scoped: Restarting a process loses cached content (by design — PostgreSQL is the durable tier)
  3. database.json must be rebuilt: Changes to YAML or MDX frontmatter are not visible to the frontend until build-data runs
  4. Citation content is append-mostly: Old fetched content is not automatically refreshed

  • Architecture — System overview and design patterns
  • Automation Tools — Full CLI reference
  • Data System Authority Rules — Which data system is authoritative for each entity