Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusDocumentationDashboard
Edited today1.0k words
Content2/12
LLM summaryScheduleEntityEdit history
Tables6/ ~1Diagrams9Int. links0/ ~8Ext. links0/ ~2Footnotes0References0/ ~1Quotes0Accuracy0

Database Schema Overview

The wiki-server PostgreSQL database has 42 active tables (plus 7 archived) organized into seven domains. This page documents every table, its purpose, key relationships, and the ongoing Phase 4a integer-PK migration.

Schema source of truth: apps/wiki-server/src/schema.ts (Drizzle ORM)


High-Level Architecture

Loading diagram...

Domain 1: Core Content

The foundational tables that mirror YAML/MDX source data into a queryable PostgreSQL store.

Loading diagram...

Key tables

TableRows (typical)Purpose
wiki_pages≈700Mirror of MDX pages. Full-text searchable via GIN index on search_vector. Dual-ID system: text id (legacy) + integer_id (Phase 4a).
entities≈600Mirror of data/entities/*.yaml. Organizations, people, concepts, risks, etc.
resources≈1,000Mirror of data/resources/*.yaml. Papers, blog posts, reports — the source library. fetch_status tracks source availability (ok/dead/paywall/error).
facts≈3,000Mirror of KB YAML. Numeric/string facts with timeseries support via measure + as_of. Unique on (entity_id, fact_id).
summaries≈400LLM-generated entity summaries. One per entity.
page_links≈10,000Directional knowledge graph. Link types: yaml_related, entity_link, name_prefix, similarity, shared_tag.
entity_ids≈1,000Central ID registry. Sequence-allocated. Used for stable E-prefixed URLs (e.g., /wiki/E42).

Phase 4a: Integer PK migration

Many tables have dual columns: page_id_old (TEXT, legacy) and page_id_int (INTEGER, new). The migration is in progress:

  • New writes go to both columns (dual-write)
  • Reads are migrating from old to new
  • Once reads are fully cut over, page_id_old columns will be dropped (Phase D2b)

Domain 2: Citation & Verification System

Tracks footnote citations, fetched source content, accuracy verification, and hallucination risk scoring.

Loading diagram...

Key tables

TablePurpose
citation_quotesPer-footnote citation data with LLM verification. accuracy_verdict: accurate, minor_issues, inaccurate, unsupported, not_verifiable. Unique on (page_id, footnote).
citation_contentCached fetched HTML/text from source URLs. Keyed by URL.
citation_accuracy_snapshotsPoint-in-time page-level citation health aggregations.
page_citationsRegular (non-claim) footnote citations. reference_id shared namespace with legacy claim references.
hallucination_risk_snapshotsLLM-assessed hallucination risk per page. Score 0–100, level: low/medium/high.
resource_citationsMany-to-many join: which resources are cited on which pages.

Domain 3: KB Fact Verification

Independent verification system for structured KB facts against external sources.

Loading diagram...

The verification system has two layers:

  1. Evidence (kb_fact_resource_verifications): One row per fact-resource check. Records what the LLM found.
  2. Conclusions (kb_fact_verdicts): One row per fact. Aggregated verdict derived from all evidence rows. Verdict options: confirmed, contradicted, unverifiable, outdated, partial, unchecked.

Domain 4: Auto-Update System

Daily automated content updates driven by RSS feeds and web searches.

Loading diagram...

Pipeline flow: Fetch sourcesScore relevanceRoute to pagesRun improve pipelineRecord results


Domain 5: Agent & Session Tracking

Coordinates concurrent Claude Code agents and records session history.

Loading diagram...

Key tables

TablePurpose
active_agentsLive agent coordination. Heartbeat-based stale detection (>30 min). Status: active, completed, errored, stale.
agent_session_eventsAudit trail timeline. Event types: registered, checklist_check, status_update, error, note, completed.
agent_sessionsFull session lifecycle. Stores checklist, PR outcome, cost, duration, and session log fields.
sessionsLegacy session log (being superseded by agent_sessions).

Domain 6: Financial Data

Structured financial records for AI companies and organizations — funding, personnel, equity.

Loading diagram...

Design notes

  • NUMERIC type for amounts (not FLOAT) — preserves exact financial values
  • TEXT references for person/org IDs — supports both entity IDs and display names for non-entity records (e.g., "D. E. Shaw Research")
  • 10-char VARCHAR primary keys — stable IDs allocated from entity_ids sequence
  • Temporal support: start_date/end_date for personnel and division_personnel, as_of/valid_end for equity positions
  • Divisions model org sub-units (funds, teams, departments, labs, program areas) and link to personnel and funding programs

Domain 7: Infrastructure

Background job processing, maintenance tracking, and incident monitoring.

Loading diagram...

Key tables

TablePurpose
jobsBackground task queue. Workers claim via SELECT ... FOR UPDATE SKIP LOCKED. Status: pending → claimed → running → completed/failed.
groundskeeper_runsMaintenance daemon execution history. Circuit breaker pattern: tracks consecutive failures.
service_health_incidentsInfrastructure outage tracking. Severity: critical/high/medium/low. Status: open → investigating → resolved.
propertiesControlled vocabulary for structured data. Defines attributes like "valuation", "headcount", "ceo". Referenced by the (now archived) statements system.

Archived Tables

Seven tables were archived in migration 0065 (March 2026). They are renamed to _archived_* and kept for backward-compatible FK references but are no longer written to.

Archived TableOriginal PurposeReplacement
_archived_claimsExtracted factual claimsKB facts system (packages/kb/data/things/)
_archived_claim_sourcesPer-claim resource linkingKB fact resource verifications
_archived_claim_page_referencesClaim-to-page mappingPage citations + KB components
_archived_statementsTyped structured/attributed factsKB facts YAML + facts table
_archived_statement_citationsStatement resource linkingKB fact resource verifications
_archived_statement_page_referencesStatement-to-page mappingPage citations
_archived_entity_coverage_scoresEntity quality snapshotsKB-level quality metrics

Full Relationship Map

This diagram shows all active foreign key relationships in the database.

Loading diagram...

Connection Architecture

Application Pool:  max 10 connections, statement_timeout: 30s
Migration Pool:    max 1 connection, statement_timeout: unlimited, lock_timeout: 60s
  • DATABASE_URL → application queries
  • DATABASE_MIGRATION_URL → DDL migrations (falls back to DATABASE_URL)
  • Drizzle ORM for schema management, migrations run on server startup
  • 78 migration files tracking schema evolution since project inception

Migration History

PhaseMigrationsPeriodKey Changes
Phase 00000–0007InitialEntity, fact, claim, resource systems
Phase 10008–0014Auto-update, performance tuning, page links
Phase 20015–0020Entity data model, facts timeseries, jobs, agent sessions
Phase 30021–0045Claims → statements unification, hallucination risk, health monitoring
Phase 4a0046–0066Integer PK migration (dual-write strategy), table archival
Phase 50067–0077CurrentKB verification, personnel, grants, funding, equity, investments, divisions, funding programs