Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusDocumentation
Edited today2.5k words3 backlinksUpdated bimonthlyDue in 9 weeks
9.5ImportancePeripheral9ResearchMinimal
Content2/13
LLM summaryScheduleEntityEdit historyOverview
Tables9/ ~10Diagrams4/ ~1Int. links14/ ~20Ext. links0/ ~13Footnotes0/ ~8References0/ ~8Quotes0Accuracy0Backlinks3

System Architecture

This document provides a technical overview of how the Longterm Wiki is built, the novel design patterns it uses, and the rationale behind key architectural decisions. It's intended both as a reference for contributors and as a catalog of the ideas that make the system work.

Keeping This Updated

When making significant changes to pipelines or data flow, update the relevant sections here. See Documentation Maintenance for guidelines.


High-Level Architecture

The wiki is a Next.js 15 application with a YAML-first data layer, a CLI toolchain (Crux), and an AI-assisted content pipeline.

Loading diagram...

Tech Stack

LayerTechnology
FrameworkNext.js 15 with App Router
ComponentsReact 19 + next-mdx-remote
StylingTailwind CSS v4 + shadcn/ui
Type SafetyTypeScript + Zod schemas
GraphsReactFlow (XYFlow) + Dagre/ELK layout
DiagramsMermaid 11
SearchPostgreSQL full-text search (wiki-server)
CLICrux (custom multi-domain CLI)
DataKB YAML + YAML sources → JSON build artifacts
Workspacepnpm workspaces (app + crux)

Clever Ideas

This section catalogs the novel architectural patterns — the ideas that distinguish this system from a typical documentation site.

1. Multi-Signal Relationship Graph

Location: app/scripts/build-data.mjs (lines 230-424)

Instead of manually curating "related pages" links, the system computes a weighted relationship graph by combining five different signals:

SignalWeightSource
Explicit YAML relatedEntries10Human-authored
Name/prefix matching (e.g. "anthropic" ↔ "anthropic-ipo")6Structural
Content \<EntityLink\> references5Content-derived
N-gram content similarity0–3 (scaled)Computed
Shared tags (specificity-weighted)variesComputed

Each neighbor's score gets a quality boost based on the target page's quality and importance ratings: boost = 1 + quality/40 + importance/400 (max ~1.45x). Unrated pages default to average values so they aren't penalized.

The output uses type-diverse selection: at least 2 entries from each entity type are guaranteed before filling remaining slots by score. This prevents the "related" sidebar from being dominated by one type.

Directional relationship labels ("mitigates", "caused by") are preserved through an inverse-label mapping table, so both directions of a relationship get meaningful labels.

2. Stable Numeric ID System

Location: app/scripts/build-data.mjs (lines 704-776)

Every entity gets a stable numeric ID (E1, E42, E552) that enables canonical URLs surviving slug renames.

The key insight: IDs are allocated atomically by the wiki server and written back to source files. This means:

  • YAML entities and MDX frontmatter are the single source of truth
  • New entities get auto-assigned IDs from the server on their first build
  • The server's PostgreSQL database prevents race conditions and ID reassignment
  • ID conflicts are detected at build time and fail the build

Resolution at runtime supports both numeric IDs and slugs, so <EntityLink id="E521" name="coefficient-giving" > and path-based lookups both work.

Location: app/scripts/build-data.mjs (lines 190-225, 867-888)

The system merges two backlink sources intelligently:

  • Explicit YAML relatedEntries (semantic, directional)
  • MDX content scans for EntityLink components (implicit, extracted by regex)

Content scanning happens before raw MDX is stripped from the build output, creating an inbound-link map that's deduplicated by ID. This means every entity knows both who links to it and who it links to, without manual maintenance.

4. N-gram Redundancy Detection

Location: app/scripts/lib/redundancy.mjs

Pages are compared using 5-word n-gram shingling (Jaccard similarity) combined with word overlap. The system:

  • Extracts clean text (strips code blocks, JSX, tables, headers, markdown formatting)
  • Compares only within the same contentFormat (articles vs. tables vs. diagrams) to avoid false positives
  • Uses max(shingleSimilarity, wordSimilarity * 0.8) as combined metric
  • Stores top 5 similar pages per page at a 10% threshold

This feeds into the relationship graph (signal #4) and helps editors find pages that overlap.

5. Safe Expression Evaluator for Computed Facts

Location: app/scripts/lib/computed-facts.mjs

Facts can reference other facts in expressions like {openai.revenue-2024} * {growth-rate}. Instead of using eval(), the system uses a hand-written recursive descent parser that supports:

  • Human-readable numeric parsing: "$350 billion"350_000_000_000, "40%"0.4
  • Arithmetic: +, -, *, /, parentheses
  • {entity.factId} references resolved in topological order
  • Format strings for display (currency prefixes, unit scaling)

Dependencies are resolved topologically, so a fact referencing another computed fact works correctly. Non-computable facts (qualitative values) are flagged with noCompute and skipped.

6. Build-Time Entity Transformation

Location: app/scripts/lib/entity-transform.mjs

Raw YAML entities are transformed into strictly typed entities at build time via a pure transformEntity() function. This handles:

  • Type migration: Old names (lab-frontier, researcher) map to canonical types (organization, person)
  • Subtype extraction: lab-frontierorganization + orgType: frontier-lab
  • CustomField extraction: Generic key-value customFields are promoted to typed fields (Role, Affiliation, Founded)
  • Risk categorization: Risk entities are auto-categorized (epistemic, misuse, structural) via a mapping table

By doing this at build time, the runtime never encounters legacy type names. Unknown entity types pass through unchanged — the system is forward-compatible.

7. Format-Aware Quality Metrics

Location: crux/lib/metrics-extractor.ts

Content quality is measured structurally, but the scoring adapts to the content format:

  • Articles are scored on word count, section structure, presence of overview/conclusion
  • Tables aren't penalized for low word count
  • Diagrams don't need prose length or section counts

Metrics include raw counts (words, tables, diagrams, internal links, footnotes), ratios (bullet density), boolean checks (has overview?), and a composite structural score (0-15 raw, normalized to 0-50). A suggestQuality function proposes quality ratings based on structural scores, and getQualityDiscrepancy flags pages where the LLM-assigned quality disagrees with structural evidence.

8. Single-Pass Validation Engine

Location: crux/lib/validation-engine.ts

Instead of having 20+ separate validator scripts that each re-read all 625 files, the validation engine loads all content once and runs composable rules against it:

engine.load()  →  read all files once  →  run each Rule.check()  →  collect Issues

Each Rule has:

  • check(): returns issues (pure function)
  • Optional fix(): returns corrected content (declarative FixSpec with oldText/newText)
  • scope: 'file' (runs per file) or 'global' (runs once on all files)

Fixes are applied in bulk and logged to edit-logs. Four rules are CI-blocking (comparison-operators, dollar-signs, frontmatter-schema, numeric-id-integrity); the rest are advisory.

9. YAML-First MDX Generation

Location: app/scripts/lib/mdx-generator.mjs

For entities whose content is defined entirely in YAML, minimal MDX stub files are auto-generated. The guard condition for regeneration is carefully conservative:

shouldGenerateMdx = !fileExists OR (isAutoGenerated AND no ## headings AND < 20 lines)

This means: generate if missing, regenerate if it's a short auto-generated stub, but never overwrite custom content (detected by presence of ## headings or significant length). This enables a YAML-first workflow where data authors edit YAML and MDX files appear automatically.

10. Lazy-Loaded Index System

Location: app/src/data/index.ts

The database is loaded once at server startup, but indexes are built lazily on first access:

let _index: Map<string, Entity> | null = null;
function getIndex() {
  if (!_index) _index = new Map(getEntities().map(e => [e.id, e]));
  return _index;
}

This avoids building indexes for entity types that are never queried in a given request. Combined with Zod validation at load time (with graceful fallback to GenericEntity for unknown types), it balances strictness with forward-compatibility.

Location: app/src/lib/search.ts, app/src/app/api/search/route.ts

Search uses PostgreSQL full-text search via the wiki-server, proxied through the /api/search Next.js route. The client sends queries to this proxy, which forwards them to the wiki-server's search endpoint. This replaced an earlier MiniSearch client-side fallback, simplifying the search stack and reducing client bundle size.

12. Entity Ontology with Display Metadata

Location: app/src/data/entity-ontology.ts

A single file defines the canonical ontology for 30+ entity types, each with:

  • Lucide icon component
  • iconColor (Tailwind classes, light + dark variants)
  • badgeColor for explore-page filtering
  • headerColor for InfoBox headers

Organization subtypes (frontier-lab, safety-org, startup, academic) get their own display metadata via a separate ORG_TYPE_DISPLAY map. Backward-compat aliases (researcherperson, lab-*organization) allow gradual migrations without breaking existing data.

13. Per-Page Edit Logs

Location: crux/lib/edit-log.ts, data/edit-logs/

Each page has a separate YAML file (data/edit-logs/<page-id>.yaml) tracking who changed it, when, and how:

- date: "2026-02-13"
  tool: crux-improve
  agency: ai-directed
  tier: standard
  note: "Added citations and restructured overview"

By storing edit history outside of page frontmatter, the system separates editorial metadata from content. LLM-generated content can't accidentally corrupt the edit log. The bulk-fix system logs one entry per fixed file automatically.

14. Session Log → Change History Integration

Location: app/scripts/build-data.mjs (lines 45-100)

Claude Code session logs (.claude/sessions/*.md) are parsed at build time and attached to pages in database.json. The structured format:

## 2026-02-13 | branch-name | Short title
**What was done:** Summary text.
**Pages:** page-id-1, page-id-2

...enables the system to show "what changed and why" for any page, correlated with git branches and PRs, without modifying the content files themselves.

15. Frontmatter Entity Auto-Creation

Location: app/scripts/lib/frontmatter-scanner.mjs

Pages don't need a corresponding YAML entity file. The build script auto-creates entities from MDX frontmatter for any page that doesn't have one:

YAML entities (explicit) + frontmatter entities (auto-created) = full entity set

YAML entities take precedence. This means a page can start as just an MDX file with frontmatter, and the system treats it as a first-class entity — it gets a numeric ID, appears in search, and can be linked via <EntityLink>.

16. Inverse Relationship Labels

Location: app/scripts/build-data.mjs (lines 259-291)

When entity A declares relationship: "mitigates" toward entity B, the system auto-generates the inverse label for the B→A direction using a lookup table:

"mitigates" ↔ "mitigated by"
"causes" ↔ "caused by"
"enables" ↔ "enabled by"
"child-of" ↔ "parent of"

Explicit labels are never overwritten by inferred ones. This gives both sides of a relationship meaningful edge labels without requiring authors to declare both directions.

17. Tag Specificity Weighting

Location: app/scripts/build-data.mjs (lines 349-361)

When computing related entities from shared tags, rarer tags get more weight:

specificity = 1 / log2(tagCount + 2)

A tag shared by 3 entities is more informative than one shared by 300. This prevents broad tags like "ai-safety" from drowning out specific connections.


Core Systems

Entity Data Pipeline

Flow: YAML sources → entity-transform.mjs → build-data.mjs → JSON artifacts → React components

ComponentLocationPurpose
KB YAMLpackages/kb/data/things/*.yamlAuthoritative structured facts (valuations, revenue, etc.)
Source YAMLdata/entities/*.yamlHuman-editable entity definitions
Entity transformapp/scripts/lib/entity-transform.mjsType mapping and normalization
Build scriptapp/scripts/build-data.mjsMain compilation pipeline
Generated JSONapp/src/data/database.jsonBrowser-ready merged data
Data layerapp/src/data/index.tsRuntime access with Zod validation
Componentsapp/src/components/wiki/Display entity data

Key files generated:

  • database.json — All entities, pages, relations, facts, search data, statistics (includes ID registry)

Wiki-Server (PostgreSQL)

Purpose: Durable storage for citation content, audit results, claims, facts, and other structured data. Provides full-text search and typed API access.

Location: Remote PostgreSQL database accessed via the wiki-server's Hono RPC API.

TablePurpose
citation_contentFull text of fetched source URLs
citation_auditsPer-page citation verification results
claimsExtracted atomic claims with source references
resourcesExternal resource metadata
entitiesEntity metadata (synced from YAML)
agent_sessionsClaude Code session logs

CLI tools access the database through apiRequest() in crux/lib/wiki-server/. The frontend uses typed RPC clients with InferResponseType<> for compile-time type safety.

See: Content Database for the full storage architecture.

Page Creation Pipeline

Purpose: Generate wiki pages with proper citations using AI research and synthesis.

Pipeline phases:

canonical-links → research-perplexity → register-sources → fetch-sources
    → research-scry → synthesize → verify-sources → validate-loop → grade
Loading diagram...

Key design decisions:

DecisionRationale
Perplexity for researchCheap (≈$0.10), good at web search, provides citation URLs
Register + fetch sourcesEnables quote verification against actual source content
Verify-sources phaseCatches hallucinated quotes before publication
Validation loopIterative fixing ensures build-passing output

Cost tiers: budget ($2-3), standard ($4-6), premium ($8-12) for create; polish ($2-3), standard ($5-8), deep ($10-15) for improve.

See: Page Creator Pipeline for experiment results.

Crux CLI

Purpose: Unified CLI for all wiki tooling.

Architecture: Domain-based command dispatch with 12+ domains:

pnpm crux validate          # Validation suite
pnpm crux content create    # AI page creation
pnpm crux content improve   # AI page improvement
pnpm crux fix escaping      # Auto-fix MDX issues
pnpm crux analyze           # Content analysis
pnpm crux edit-log view     # Per-page edit history

Each domain is a module with a commands export. Commands are async functions returning {output, exitCode}. A --ci flag switches output to JSON for CI integration.

See: crux/README.md for the full domain reference.

Validation System

Purpose: Enforce content quality and consistency at multiple levels.

Architecture: Single-pass validation engine runs composable rules. Each rule checks specific patterns and can optionally auto-fix issues.

CategoryExamplesBlocking?
Criticaldollar-signs, entitylink-ids, fake-urlsYes - breaks build
Qualitytilde-dollar, markdown-lists, placeholdersNo - warnings only

Design decision: Two-tier validation allows fast feedback while still catching serious issues. Critical rules run in CI; quality rules are advisory.


Data Flow Diagrams

Page Creation Data Flow

Loading diagram...

Entity Resolution Flow

Loading diagram...

Design Principles

1. Source Files as Single Source of Truth

Human-editable files (YAML, MDX) are the canonical source. Everything else — JSON, search indexes, the ID registry — is a derived build artifact that can be regenerated. Generated files are gitignored where appropriate. This means: no merge conflicts on generated data, clear ownership boundaries, and deterministic builds from source.

2. Build-Time Computation, Runtime Speed

Expensive operations (relationship graph computation, redundancy detection, fact evaluation, search index building, entity transformation) all happen at build time. Runtime reads pre-computed JSON through lazy-loaded indexes. The result: a fast site with rich computed data, without runtime computation costs.

3. Progressive Enhancement for AI Features

AI features (summaries, page creation, grading) are optional enhancements. The wiki builds and serves without any API keys. Failures in the AI pipeline don't break the site. Costs are predictable and opt-in per-tier.

4. Validation at Multiple Levels

LevelToolWhenBlocking?
SyntaxMDX compilerBuild timeYes
SchemaZod validationBuild time (with fallback)Soft
Content rulesValidation engineCI3 rules blocking
ReferencesEntityLink validatorCIAdvisory
QualityGrading pipelineManual triggerNo

5. Forward-Compatible by Default

Unknown entity types pass through as GenericEntity (preserving all custom fields). Backward-compat aliases handle gradual migrations. The Zod schema validation logs warnings in dev but doesn't fail builds for unrecognized types. New features can be added to the data layer without updating every consumer.


Key Configuration Files

FilePurposeWhen to Edit
app/next.config.tsNext.js + MDX configurationAdding plugins, redirects
app/src/data/entity-schemas.tsEntity type definitions (Zod)Adding entity types or fields
app/src/data/entity-ontology.tsDisplay metadata (icons, colors)Adding entity display styles
app/src/data/entity-type-names.tsCanonical entity type listAdding new entity types
app/src/lib/internal-nav.tsInternal sidebar navigationAdding internal pages
app/scripts/build-data.mjsMain build pipelineChanging data flow
crux/lib/validation-engine.tsValidation rules frameworkAdding validation rules

Environment Variables

VariablePurposeRequired For
ANTHROPIC_API_KEYClaude API accessSummaries, grading, page creation
OPENROUTER_API_KEYPerplexity via OpenRouterPage creation research
FIRECRAWL_KEYWeb page fetchingSource content fetching
SCRY_API_KEYAcademic paper searchDeep research tier

All are optional. Features gracefully degrade when keys are missing.


Repository Structure

longterm-wiki/
├── content/docs/               # ~700 MDX wiki pages
│   ├── knowledge-base/         # Risks, responses, orgs, people
│   ├── models/                 # Analytical frameworks
│   ├── project/                # Public project documentation
│   └── internal/               # Contributor docs (including this page)
├── packages/kb/                # Knowledge Base package
│   ├── data/things/            # Authoritative structured facts (KB YAML)
│   ├── data/schemas/           # Property schemas (60 properties)
│   └── src/                    # KB loader, custom YAML tags (!ref, !date)
├── data/                       # YAML source data
│   ├── entities/               # Entity definitions (split by type)
│   ├── facts/                  # Legacy facts (deprecated for KB entities)
│   ├── resources/              # External resource metadata
│   ├── insights/               # Cross-page insights
│   ├── graphs/                 # Cause-effect graph YAML
│   └── edit-logs/              # Per-page edit history
├── app/                        # Next.js 15 frontend
│   ├── src/
│   │   ├── app/                # App Router pages
│   │   ├── components/         # React components (wiki/, ui/)
│   │   ├── data/               # Data layer + Zod schemas
│   │   └── lib/                # Utilities, search, navigation
│   └── scripts/                # Build scripts + libraries
│       ├── build-data.mjs      # Main data compilation pipeline
│       └── lib/                # Build utilities (transform, metrics, search, etc.)
├── crux/                       # Crux CLI + validation
│   ├── crux.mjs                # CLI entry point
│   ├── commands/               # Domain handlers
│   ├── authoring/              # Page create/improve/grade
│   ├── lib/                    # Validation engine, templates, utilities
│   └── validate/               # Validation rule implementations
└── package.json                # pnpm workspace root

Documentation Maintenance

This architecture documentation should be updated when:

  1. New pipeline phases added — Update the pipeline diagram and phase list
  2. New clever patterns introduced — Add to the "Clever Ideas" section
  3. Database schema changes — Update the ER diagram
  4. New environment variables — Add to the environment variables table
  5. Tech stack changes — Update the stack table and diagrams

  • About This Wiki — Contributor overview
  • Content Database — Storage architecture (PostgreSQL, caching, YAML)
  • Automation Tools — CLI reference
  • Page Creator Pipeline — Generation experiments
  • Schema Overview — Entity types and data relationships
  • Entity Reference — Complete entity type catalog
  • Data System Authority Rules — Which data system is authoritative for each entity
  • Canonical Facts & Calc — KB fact components and usage conventions