Database Schema Overview
The wiki-server PostgreSQL database has 42 active tables (plus 7 archived) organized into seven domains. This page documents every table, its purpose, key relationships, and the ongoing Phase 4a integer-PK migration.
Schema source of truth: apps/wiki-server/src/schema.ts (Drizzle ORM)
High-Level Architecture
Domain 1: Core Content
The foundational tables that mirror YAML/MDX source data into a queryable PostgreSQL store.
Key tables
| Table | Rows (typical) | Purpose |
|---|---|---|
| wiki_pages | ≈700 | Mirror of MDX pages. Full-text searchable via GIN index on search_vector. Dual-ID system: text id (legacy) + integer_id (Phase 4a). |
| entities | ≈600 | Mirror of data/entities/*.yaml. Organizations, people, concepts, risks, etc. |
| resources | ≈1,000 | Mirror of data/resources/*.yaml. Papers, blog posts, reports — the source library. fetch_status tracks source availability (ok/dead/paywall/error). |
| facts | ≈3,000 | Mirror of KB YAML. Numeric/string facts with timeseries support via measure + as_of. Unique on (entity_id, fact_id). |
| summaries | ≈400 | LLM-generated entity summaries. One per entity. |
| page_links | ≈10,000 | Directional knowledge graph. Link types: yaml_related, entity_link, name_prefix, similarity, shared_tag. |
| entity_ids | ≈1,000 | Central ID registry. Sequence-allocated. Used for stable E-prefixed URLs (e.g., /wiki/E42). |
Phase 4a: Integer PK migration
Many tables have dual columns: page_id_old (TEXT, legacy) and page_id_int (INTEGER, new). The migration is in progress:
- New writes go to both columns (dual-write)
- Reads are migrating from old to new
- Once reads are fully cut over,
page_id_oldcolumns will be dropped (Phase D2b)
Domain 2: Citation & Verification System
Tracks footnote citations, fetched source content, accuracy verification, and hallucination risk scoring.
Key tables
| Table | Purpose |
|---|---|
| citation_quotes | Per-footnote citation data with LLM verification. accuracy_verdict: accurate, minor_issues, inaccurate, unsupported, not_verifiable. Unique on (page_id, footnote). |
| citation_content | Cached fetched HTML/text from source URLs. Keyed by URL. |
| citation_accuracy_snapshots | Point-in-time page-level citation health aggregations. |
| page_citations | Regular (non-claim) footnote citations. reference_id shared namespace with legacy claim references. |
| hallucination_risk_snapshots | LLM-assessed hallucination risk per page. Score 0–100, level: low/medium/high. |
| resource_citations | Many-to-many join: which resources are cited on which pages. |
Domain 3: KB Fact Verification
Independent verification system for structured KB facts against external sources.
The verification system has two layers:
- Evidence (
kb_fact_resource_verifications): One row per fact-resource check. Records what the LLM found. - Conclusions (
kb_fact_verdicts): One row per fact. Aggregated verdict derived from all evidence rows. Verdict options: confirmed, contradicted, unverifiable, outdated, partial, unchecked.
Domain 4: Auto-Update System
Daily automated content updates driven by RSS feeds and web searches.
Pipeline flow: Fetch sources → Score relevance → Route to pages → Run improve pipeline → Record results
Domain 5: Agent & Session Tracking
Coordinates concurrent Claude Code agents and records session history.
Key tables
| Table | Purpose |
|---|---|
| active_agents | Live agent coordination. Heartbeat-based stale detection (>30 min). Status: active, completed, errored, stale. |
| agent_session_events | Audit trail timeline. Event types: registered, checklist_check, status_update, error, note, completed. |
| agent_sessions | Full session lifecycle. Stores checklist, PR outcome, cost, duration, and session log fields. |
| sessions | Legacy session log (being superseded by agent_sessions). |
Domain 6: Financial Data
Structured financial records for AI companies and organizations — funding, personnel, equity.
Design notes
- NUMERIC type for amounts (not FLOAT) — preserves exact financial values
- TEXT references for person/org IDs — supports both entity IDs and display names for non-entity records (e.g., "D. E. Shaw Research")
- 10-char VARCHAR primary keys — stable IDs allocated from
entity_idssequence - Temporal support:
start_date/end_datefor personnel and division_personnel,as_of/valid_endfor equity positions - Divisions model org sub-units (funds, teams, departments, labs, program areas) and link to personnel and funding programs
Domain 7: Infrastructure
Background job processing, maintenance tracking, and incident monitoring.
Key tables
| Table | Purpose |
|---|---|
| jobs | Background task queue. Workers claim via SELECT ... FOR UPDATE SKIP LOCKED. Status: pending → claimed → running → completed/failed. |
| groundskeeper_runs | Maintenance daemon execution history. Circuit breaker pattern: tracks consecutive failures. |
| service_health_incidents | Infrastructure outage tracking. Severity: critical/high/medium/low. Status: open → investigating → resolved. |
| properties | Controlled vocabulary for structured data. Defines attributes like "valuation", "headcount", "ceo". Referenced by the (now archived) statements system. |
Archived Tables
Seven tables were archived in migration 0065 (March 2026). They are renamed to _archived_* and kept for backward-compatible FK references but are no longer written to.
| Archived Table | Original Purpose | Replacement |
|---|---|---|
_archived_claims | Extracted factual claims | KB facts system (packages/kb/data/things/) |
_archived_claim_sources | Per-claim resource linking | KB fact resource verifications |
_archived_claim_page_references | Claim-to-page mapping | Page citations + KB components |
_archived_statements | Typed structured/attributed facts | KB facts YAML + facts table |
_archived_statement_citations | Statement resource linking | KB fact resource verifications |
_archived_statement_page_references | Statement-to-page mapping | Page citations |
_archived_entity_coverage_scores | Entity quality snapshots | KB-level quality metrics |
Full Relationship Map
This diagram shows all active foreign key relationships in the database.
Connection Architecture
Application Pool: max 10 connections, statement_timeout: 30s
Migration Pool: max 1 connection, statement_timeout: unlimited, lock_timeout: 60s
DATABASE_URL→ application queriesDATABASE_MIGRATION_URL→ DDL migrations (falls back toDATABASE_URL)- Drizzle ORM for schema management, migrations run on server startup
- 78 migration files tracking schema evolution since project inception
Migration History
| Phase | Migrations | Period | Key Changes |
|---|---|---|---|
| Phase 0 | 0000–0007 | Initial | Entity, fact, claim, resource systems |
| Phase 1 | 0008–0014 | — | Auto-update, performance tuning, page links |
| Phase 2 | 0015–0020 | — | Entity data model, facts timeseries, jobs, agent sessions |
| Phase 3 | 0021–0045 | — | Claims → statements unification, hallucination risk, health monitoring |
| Phase 4a | 0046–0066 | — | Integer PK migration (dual-write strategy), table archival |
| Phase 5 | 0067–0077 | Current | KB verification, personnel, grants, funding, equity, investments, divisions, funding programs |