Citation Architecture: Current State & Unified Proposal
This document analyzes the current citation/reference system in the Longterm Wiki and proposes a concrete plan to unify it. The goal: every wiki page has a single, beautiful bibliography with rich metadata, verification indicators, and inline hover cards --- built on standard Markdown footnotes with zero migration cost.
Tracking issues:
- #974: Unified citation rendering — 3 exemplar pages (Phase 1)
- #975: Citation system migration + cleanup (Phases 2-3)
The Problem: Five Overlapping Systems
The wiki has five overlapping citation/reference mechanisms, the newest being KB fact references:
1. remark-gfm Footnotes (MDX authoring format)
Standard Markdown footnotes used in all ~625 pages:
Kalshi is the first federally regulated prediction market.[^4]
remark-gfm compiles these into:
- Inline:
<sup><a data-footnote-ref href="#user-content-fn-1">[1]</a></sup> - Bottom section:
<section data-footnotes><ol><li id="user-content-fn-1">...</li></ol></section>
Coverage: Complete (every page has footnotes). Metadata: None (just a title + URL link).
2. References Component (resource YAML rendering)
Server component (apps/web/src/components/wiki/References.tsx) that renders from resource YAML entries in data/resources/*.yaml:
<References pageId="kalshi" />
Shows: title, author, date, credibility badge, publication name, peer-review status, expandable details with summary and verification dots.
Coverage: Partial --- only sources that have hand-written YAML entries. On the Kalshi page: 31 out of 87 citations have resource entries. Metadata: Rich (title, author, date, type, credibility, summary, tags, publication).
3. CitationOverlay (verification indicators)
Client component (apps/web/src/components/wiki/CitationOverlay.tsx) that uses DOM portals to inject colored dots onto footnote [N] refs:
- Finds all
<a data-footnote-ref>elements in the article - Looks up verification data from Postgres
citation_quotestable - Renders hover cards with accuracy verdicts, supporting quotes, confidence scores
Coverage: Only citations that have been accuracy-checked. Metadata: Verification-only (no source metadata like author/date).
4. ResourceLink (<R>) (unused inline citation)
Component (apps/web/src/components/wiki/ResourceLink.tsx) designed as an alternative to footnotes:
<R id="resource-id" n={1}>Link text</R>
Shows a tooltip with resource metadata + credibility badge. Imported on the Kalshi page but never actually used. No page in the wiki uses this component in practice.
5. KB Fact References (structured data citations)
The newest citation type, added in early 2026. KB facts (packages/kb/data/things/*.yaml) are the canonical source for structured data (employee counts, funding amounts, founding dates, etc.). Pages can cite them via special footnote markers:
Anthropic has approximately 1,500 employees.[^2]
The reference preprocessor (apps/web/src/lib/reference-preprocessor.ts) handles three marker types at build time:
[^1]--- claim references (legacy, being migrated away viapnpm crux footnotes migrate-cr)[^5]--- citation references (standard DB-backed citations)[^kb-{factId}]--- KB fact references (links to structured facts with source URL, sourceResource, sourceQuote, and asOf date)
The preprocessor replaces these markers with numbered [^N] footnotes and appends auto-generated definitions, so remark-gfm processes them like any other footnote. KB fact footnotes include the fact value, date, and source link.
Coverage: Growing --- used on pages that display KB-sourced data. Metadata: Rich (value, unit, date, source URL, source quote, notes).
The Result
Readers see:
- A plain footnote section at the bottom (remark-gfm) with just title + URL links
- A separate "References" section below that (References component) with different numbering and richer metadata, but only for some sources
- Colored verification dots injected onto inline
[N]refs (CitationOverlay) with no connection to either bibliography - KB fact footnotes generated by the reference preprocessor, which appear as regular numbered footnotes but source their data from the KB YAML layer
Case Study: Kalshi Page
The Kalshi page is a good exemplar because it exercises the full system:
| Metric | Value |
|---|---|
| Total footnotes | 87 |
| Unique source URLs | ≈35 |
| Resource YAML entries | 31 |
| Citation quotes (Postgres) | 87 |
| Verified accurate | 84 (96.6%) |
| Broken citations | 3 (HTTP 403) |
| Quality score | 25/100 |
| Hallucination risk | Medium (50/100) |
Key observations:
- 87 footnotes but only ~35 unique URLs (many footnotes cite the same source)
- 31 resource YAML entries, meaning ~4 URLs have no resource entry at all
- The remark-gfm footnote section lists all 87 entries (with duplicates)
- The References section lists the 31 matched resources (different ordering)
- Citation quotes exist for all 87 footnotes but aren't connected to the resource metadata
The gap: The remark-gfm section shows everything but with no metadata. The References section shows rich metadata but only for a subset. The verification dots are on a third layer. Nothing is unified.
Proposed Architecture: Unified Citations
Core Principle
Keep [^N] footnotes as the authoring format. They're standard Markdown, LLMs generate them naturally, every page already uses them, and they're portable. The magic happens in the build/render pipeline that enriches them with resource metadata and verification data.
Data Model
Key Design Decisions
| Decision | Choice | Reasoning |
|---|---|---|
| Citation format | Keep [^N] | LLMs generate naturally, standard Markdown, zero migration |
| Suppress gfm footnotes | CSS display: none | Simple, reversible, no remark plugin complexity |
| Bibliography grouping | By unique source | 87 footnotes > 35 entries, much more readable |
| Resource creation | Auto from URLs | Manual doesn't scale; 31/87 gap proves this |
| Rendering split | Server + client hybrid | Server renders bibliography; client adds hover cards |
New Pipeline Step: Resource Auto-Registration
New crux command: pnpm crux citations register-resources <pageId>
-
Parse all
[^N]: Title: URLdefinitions from MDX -
Extract unique URLs (87 footnotes > ~35 unique URLs for Kalshi)
-
For each URL without a resource YAML entry:
- Use citation_content cache or fetch the URL
- Extract: title, domain, type, published_date, authors
- Generate resource ID (SHA-256 hash of URL, first 16 hex chars)
- Create YAML entry in the appropriate
data/resources/file
-
Report: "35 unique URLs, 31 already registered, 4 newly created"
Build Step: Footnote Index (not yet implemented)
The original proposal called for build-data.mjs to gain a footnoteIndex mapping every footnote number to its resource. This has not been built. Instead, the reference preprocessor (see Status section below) handles the mapping at MDX compile time for DB-driven references (cr-, rc-, kb-). Standard [^N] footnotes are still processed only by remark-gfm with no resource linkage.
A future footnoteIndex build step would enable richer rendering for standard footnotes (not just DB-driven ones):
{
"footnoteIndex": {
"kalshi": {
"1": { "resourceId": "abc123", "url": "https://kalshi.com/about", "title": "About Kalshi" },
"2": { "resourceId": "def456", "url": "https://research.contrary.com/...", "title": "Contrary Research" }
}
}
}
Proposed Component: UnifiedReferences (not yet implemented)
Would replace both the remark-gfm footnote section and the current References component:
-
Groups footnotes by unique source (dedup): footnotes [17]-[21] all citing Sigma World > 1 entry
-
Each source entry shows:
- Title (linked to URL)
- Metadata: publication/domain, author, year, type
- Credibility badge
- Verification dot (aggregate across all claims citing this source)
- Back-refs: "Referenced by [5] [6] [15] [16] [17] [18] [19] [20] [21]"
-
Expandable details:
- Source summary
- Per-claim verification table (from citation_quotes)
- Supporting quotes from source
Proposed Component: InlineCitationCards (not yet implemented)
Would replace CitationOverlay with richer hover cards on [N] refs:
- Source title + domain + credibility
- Verification verdict + confidence score
- Supporting quote from source
- "View in References" link
Files Changed
| Status | File | Purpose |
|---|---|---|
| Done | apps/web/src/lib/reference-preprocessor.ts | Preprocesses [^cr-], [^rc-], [^kb-] markers into numbered footnotes |
| Done | crux/commands/footnotes.ts | migrate-cr command: converts [^cr-] to [^kb-] where URLs match KB facts |
| Proposed | crux/citations/register-resources.ts | Auto-registration command |
| Proposed | apps/web/src/components/wiki/UnifiedReferences.tsx | Unified bibliography |
| Proposed | apps/web/src/components/wiki/InlineCitationCards.tsx | Enhanced hover cards |
| Proposed | apps/web/scripts/build-data.mjs | Add footnoteIndex computation |
| Proposed | apps/web/src/data/index.ts | Export footnoteIndex accessor |
| Modify | apps/web/src/app/wiki/[id]/page.tsx | Wire up new components |
| Modify | Global CSS | Suppress section[data-footnotes] |
| Deprecate | References.tsx | Replaced by UnifiedReferences |
| Deprecate | CitationOverlay.tsx | Replaced by InlineCitationCards |
| Evaluate | ResourceLink.tsx (<R>) | No pages use it; likely remove |
Implementation Plan: Three Phases
Phase 1: Get 3 Pages Very Right
Exemplar pages: Kalshi, Anthropic, existential-risk
Scope:
- Resource auto-registration: Build
register-resourcescommand. Run on all 3 pages. Every footnoted URL gets a resource YAML entry with metadata. - Footnote index: Add footnoteIndex computation to build-data.mjs. Every footnote maps to a resource.
- UnifiedReferences: Build the new component. Groups by unique source, shows rich metadata, verification dots, back-refs.
- InlineCitationCards: Build enhanced hover cards merging resource metadata + verification data.
- Suppress gfm footnotes: CSS hides the duplicate remark-gfm section.
- Citation quality: Run full verification pipeline on all 3 pages. Fix broken citations. Review accuracy flags. Content polish.
- Visual polish: Dark mode, mobile tap-to-show, accessibility.
Exit criteria: All 3 exemplar pages have:
- 100% resource coverage (every footnote > resource)
- Single unified bibliography (no duplicate sections)
- Rich hover cards on all inline [N] refs
- Full citation verification with accuracy verdicts
- 0 broken citations
Phase 2: Large Migration
Scope:
- Batch auto-registration: Run
register-resources --allacross all ~625 pages - Resolve conflicts: Handle edge cases (URLs that 404, pages with unusual footnote formats)
- Rebuild all: Rebuild database.json with footnoteIndex for every page
- Verify rendering: Spot-check 20-30 pages across entity types
- Page quality tiers (optional): Add
tier: showcase | standard | draftfrontmatter. Pages below quality threshold auto-hidden from sidebar.
Exit criteria: Every page renders with UnifiedReferences. No page shows the old dual-bibliography layout.
Phase 3: Cleanup DB + Server + Code
Scope:
- Remove deprecated components: Delete
References.tsx,CitationOverlay.tsx,ResourceLink.tsx - Clean up imports: Remove
<R>imports from all MDX files that import but don't use it - DB cleanup: Ensure citation_quotes have resource_id for all entries. Clean stale citation_content entries.
- Server sync: Sync all new resources to wiki-server Postgres
- Code cleanup: Remove old pageResources computation path (subsumed by footnoteIndex)
- Documentation: Update architecture.mdx, CLAUDE.md references
Exit criteria: No dead code. Single code path for citations. DB consistent. Server synced.
Relationship to KB as the Structured Data Layer
The Knowledge Base (KB) has become the canonical structured data layer for the wiki. Rather than a separate "claim-first" architecture, the KB serves as the authoritative source for quantitative and factual data, with citations flowing through the [^3] mechanism.
| What exists today | What it enables |
|---|---|
KB facts YAML (packages/kb/data/things/) | Single source of truth for structured data (counts, dates, funding, etc.) |
[^3] footnotes | Pages cite KB facts directly; preprocessor generates sourced footnotes |
migrate-cr tooling | Migrates legacy [^cr-] claim references to [^kb-] where the source URL matches a KB fact |
| Reference preprocessor | Unified handling of all DB-driven citation types at build time |
| What this proposal adds | What it enables for KB |
|---|---|
| Resource auto-registration | Every KB fact source gets a canonical resource entry with rich metadata |
| Unified bibliography | KB-sourced footnotes render with the same rich metadata as hand-curated resources |
| Full verification on exemplars | Verification data connects to KB facts, not just raw footnotes |
The KB sits between raw sources and wiki pages: facts are extracted from sources (with provenance), stored in YAML, and cited by pages via [^kb-] markers. Building the resource registry and unified rendering makes this pipeline visible to readers.
Status (as of March 2026)
Implemented
| Component | Location | Notes |
|---|---|---|
| Reference preprocessor | apps/web/src/lib/reference-preprocessor.ts | Handles [^cr-], [^rc-], and [^kb-] markers at MDX compile time. Replaces markers with numbered footnotes and appends auto-generated definitions. |
| KB fact references | [^kb-{factId}] syntax in MDX | Pages can cite KB facts directly. The preprocessor pulls value, source URL, date, and notes from KB YAML and generates a footnote definition. |
| migrate-cr command | pnpm crux footnotes migrate-cr | Converts legacy [^1] claim references to [^3] where the claim's source URL matches a KB fact's source. |
| CitationOverlay | apps/web/src/components/wiki/CitationOverlay.tsx | Injects verification dots onto footnote refs (still active, not yet replaced). |
| References component | apps/web/src/components/wiki/References.tsx | Renders resource YAML bibliography (still active, not yet replaced). |
Not Yet Implemented (Proposed)
| Component | Description |
|---|---|
Resource auto-registration (register-resources) | Auto-create resource YAML entries for footnote URLs that lack them. |
Footnote index (build step in build-data.mjs) | Map every footnote number to its resource for the rendering layer. |
| UnifiedReferences | Single bibliography component replacing both remark-gfm footnotes and References. |
| InlineCitationCards | Enhanced hover cards replacing CitationOverlay with resource metadata + verification. |
| CSS footnote suppression | Hide the remark-gfm section[data-footnotes] once UnifiedReferences renders the bibliography. |
Migration Progress
[^cr-]claim references: Legacy system.migrate-crtooling exists to convert these to[^kb-]where source URLs match. Full migration is ongoing.[^rc-]citation references: Active and stable. Used for DB-backed citations that don't map to KB facts.[^kb-]KB fact references: The newest and preferred mechanism for structured data citations. Adoption is growing as KB fact coverage expands.
Open Questions
- Resource YAML organization: Should auto-created resources go in a separate file (
auto-registered.yaml) or be sorted into existing category files? - Page quality tiers: Automatic from quality score, or explicit frontmatter?
- How aggressive on hiding low-quality pages? Just from sidebar, or also from search?
- Person page exemplar: If we add a 4th exemplar later, which person page? Candidates: Dario Amodei, Eliezer Yudkowsky, Stuart Russell, Geoffrey Hinton.
- KB fact coverage threshold: At what point should all structured data citations use
[^kb-]instead of[^rc-]? Should the improve pipeline auto-prefer KB facts when available? migrate-crcompletion: Should the remaining[^cr-]references be bulk-migrated, or converted opportunistically as pages are improved?