Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusDocumentation
Edited today1.6k wordsUpdated bimonthlyDue in 9 weeks
50QualityAdequate10ImportancePeripheral35ResearchLow
Content2/13
LLM summaryScheduleEntityEdit historyOverview
Tables4/ ~6Diagrams0/ ~1Int. links0/ ~12Ext. links0/ ~8Footnotes0/ ~5References0/ ~5Quotes0Accuracy0RatingsN:7 R:7 A:9 C:7

Knowledge Base Architecture

Overview

The Knowledge Base (KB) is the wiki's structured data system. It stores curated facts about entities in YAML files, validates them against typed schemas, and feeds them into the wiki's build pipeline where they power inline components, hover tooltips, comparison tables, and entity sidebar panels.

The KB lives in packages/kb/ and consists of:

  • 362 entity files in packages/kb/data/things/*.yaml (organizations, people, AI models, risks, etc.)
  • 95 properties defined in packages/kb/data/properties.yaml with display formatting, type constraints, and inverse relationships
  • 19 entity type schemas in packages/kb/data/schemas/*.yaml specifying required and recommended properties plus record collection definitions
  • 23 validation rules enforced at build time

Facts flow through the wiki as: YAML -> Graph (loader.ts) -> serialize -> database.json -> Next.js reads at build time. No runtime API calls are needed for structured data on wiki pages.

Core Data Model

Entities

Each entity is a YAML file in packages/kb/data/things/. The file contains an entity header, a list of facts, and optional record collections:

thing:
  id: anthropic          # Human-readable slug
  stableId: mK9pX3rQ7n  # Random 10-char ID that survives renames
  type: organization     # References a TypeSchema
  name: Anthropic
  numericId: "E22"       # Legacy wiki page ID
  aliases: [Anthropic PBC, Anthropic AI]

facts:
  - id: f_rev_2026_02
    property: revenue
    value: 14e9
    asOf: 2026-02
    source: https://www.reuters.com/...
    sourceResource: 8e3ff50b9ef2a1a8
    notes: "Run-rate revenue at Series G announcement"

items:
  funding-rounds:
    type: funding-round
    entries:
      i_OVNz9C3XUA:
        date: 2026-02
        amount: 30e9
        valuation: 380e9
        lead_investor: gic
        source: https://www.reuters.com/...

Facts

A fact is a typed, source-attributed assertion about an entity. The Fact interface supports:

FieldPurpose
idUnique identifier (prefixed f_ for authored, inv_ for computed inverses)
propertyReferences a property in the registry (e.g., revenue, founded-by)
valueTyped: number, text, date, boolean, ref, refs, range, min, or json
asOfWhen this was true (ISO date: YYYY, YYYY-MM, or YYYY-MM-DD)
validEndWhen this stopped being true (omit if still current)
sourceURL to the primary source
sourceResourceID linking to the wiki's curated resource registry
sourceQuoteRelevant excerpt from the source
notesFree-text annotation explaining caveats or methodology
currencyISO 4217 override for non-USD monetary values (e.g., GBP)
usdEquivalentApproximate USD conversion for cross-currency comparison

Properties

Properties are the taxonomy that gives facts meaning. Defined in packages/kb/data/properties.yaml, each property specifies:

  • Data type: number, text, date, boolean, ref, refs
  • Unit: USD, percent, tokens, employees, etc.
  • Category: financial, people, safety, model, risk, policy, etc.
  • Display config: divisor, prefix, suffix for human-readable formatting (e.g., $14.0B from 14e9)
  • Temporal flag: whether values change over time (revenue, headcount) vs. are static (founded-date)
  • Applies-to constraint: which entity types can use this property
  • Inverse relationships: employed-by automatically generates employer-of on the referenced entity

There are currently 95 properties across 21 categories, supporting entity types from organizations and people to AI models, risks, policies, and incidents.

Inverse Relationships

When a property declares an inverseId, the KB automatically computes mirror facts. For example:

  • employed-by on a person entity (e.g., Jan Leike -> Anthropic) generates an employer-of fact on the organization entity (Anthropic -> Jan Leike)
  • founded-by on an organization generates founder-of on each referenced person
  • developed-by on an AI model generates developed-models on the organization

Inverse facts are content-addressed (deterministic IDs based on a SHA-256 hash of subject + property + value) and carry the derivedFrom field pointing to their source fact. The computed: true flag on the inverse property prevents authors from manually storing both sides.

Type Schemas

Each entity type has a schema in packages/kb/data/schemas/ that defines:

  • Required properties: must have at least one fact (validation error if missing)
  • Recommended properties: should have facts (validation info if missing)
  • Record collection schemas: structured sub-data with typed fields

For example, the organization schema recommends founded-date, headquarters, revenue, valuation, headcount, legal-structure, total-funding and defines record collections for funding-rounds, key-people, products, model-releases, board-members, strategic-partnerships, safety-milestones, research-areas, and grants-and-programs.

The 19 entity types with schemas are: ai-model, analysis, approach, argument, capability, case-study, concept, debate, event, funder, historical, incident, organization, person, policy, project, risk, risk-factor, and safety-agenda.

Record Collections

Record collections store ordered sub-data that does not fit the simple property-value model. They are defined per entity type in the schema and stored inline in each entity's YAML file.

Examples:

CollectionEntity TypeFields
funding-roundsorganizationdate, amount, valuation, lead_investor, source, notes
key-peopleorganizationperson (ref), title, start, end, is_founder, source, notes
productsorganizationname, launched, description, source, notes
model-releasesorganizationname, released, description, safety_level, source, notes
board-membersorganizationname, role, appointed, departed, background, appointed_by, source
safety-milestonesorganizationname, date, type, description, source, notes

Each entry has a stable key (e.g., i_OVNz9C3XUA) and typed fields validated against the schema. Fields can reference other entities via !ref YAML tags (e.g., person: !ref zR4nW8xB2f:dario-amodei), which are cross-validated during loading.

MDX Components

The KB powers several MDX components used in wiki pages:

<KBF> - Inline Fact Display

Renders a single KB fact with a hover tooltip showing source, date, and notes:

<KBF entity="anthropic" property="valuation" />          {/* Auto-formatted: "$380.0B" */}
<KBF entity="anthropic" property="revenue" showDate />   {/* "$19.0B (as of 2026-03)" */}
<KBF entity="anthropic" property="valuation">$380B</KBF> {/* Custom display text */}

<KBFactValue> - Block Fact Display

Renders a fact value with more context, used in structured sections.

<Calc> - Computed Values

Derives a value from multiple facts at render time:

<Calc expr="{anthropic.valuation} / {anthropic.revenue}" precision={0} suffix="x" />

KB Footnotes

Facts referenced via <KBF> automatically generate footnote citations using the [^1] format. These appear in the page's footnote section with full source attribution.

Entity Sidebar and Fact Tables

Entity pages include an auto-generated sidebar panel showing key facts (KBEntitySidebar) and comparison tables (KBCompareTable, KBFactTable) for cross-entity analysis.

Validation

The KB runs 23 validation checks at build time, organized into three severity levels:

Errors (block build):

  1. Required properties missing
  2. Ref/refs point to nonexistent entities
  3. StableId format invalid (must be 10 alphanumeric chars)
  4. Fact ID format invalid (must start with f_ or inv_)
  5. Entity has empty name
  6. validEnd is earlier than asOf
  7. Duplicate stableIds across entities
  8. Record collection field type mismatches or missing required fields
  9. Range value bounds invalid

Warnings:

  1. Recommended properties missing
  2. Property used on wrong entity type
  3. Temporal property missing asOf date
  4. Non-temporal property has multiple facts
  5. Stale temporal data (>2 years old for financial/product categories)
  6. Duplicate facts (same entity + property + asOf)
  7. Missing source URL
  8. Unknown property ID
  9. Invalid date format
  10. Future asOf date
  11. Bidirectional redundancy (both sides of an inverse stored)
  12. Unknown currency code

Informational:

  1. Orphan entity (no facts and no records)
  2. Completeness percentage per entity

Data Pipeline

The KB integrates into the wiki's build pipeline:

  1. Load: loadKB() reads YAML files using a two-pass approach - first pass builds the stableId index, second pass resolves !ref tags and loads facts/items
  2. Compute inverses: computeInverses() generates mirror facts for all properties with inverseId
  3. Validate: validate() runs all 23 checks across the graph
  4. Serialize: serialize() converts the in-memory Graph to a plain JSON object
  5. Build-data: apps/web/scripts/build-data.mjs merges KB data into database.json
  6. Render: Next.js reads database.json at build time; MDX components look up facts by entity + property

The in-memory Graph class provides a rich query API: getEntity(), getFacts(), getLatest(), getByProperty(), getRelated(), getItems(), getItemsMentioning().

Relationship to Statements in PostgreSQL

The wiki-server's PostgreSQL database stores statements (formerly called "claims") - text-level assertions extracted from wiki page prose. These are complementary to KB facts, not a replacement:

DimensionKB Facts (YAML)Statements (PostgreSQL)
PurposeCurated canonical dataPage text analysis
StorageYAML files in gitPostgreSQL
AuthoringManual curationLLM-extracted from prose
Data typesTyped (number, date, ref, range, etc.)Free text with optional structured fields
TemporalasOf / validEndvalueDate
Source trackingsource, sourceResource, sourceQuoteclaim_sources join table
Display<KBF>, <Calc>, tooltipsStatements table on entity pages
Build dependencyYes (feeds database.json)No (runtime queries)

KB facts are the authoritative source for structured data that appears across wiki pages. Statements provide a complementary view of what the prose says, useful for citation checking, hallucination detection, and verifying consistency between page text and canonical data.

Current Status

ComponentStatus
KB YAML data layer (362 entities)Working
Property taxonomy (95 properties)Working
19 entity type schemas with record collectionsWorking
23 validation rulesWorking
Inverse relationship computationWorking
!ref and !date YAML tags with cross-validationWorking
Currency support with USD equivalentsWorking
\<KBF\> inline fact display with tooltipsWorking
\<KBFactValue\> block displayWorking
\<Calc\> computed valuesWorking
KB footnotes integrationWorking
Entity sidebar and fact tablesWorking
Comparison tables (KBCompareTable)Working
Build pipeline integration (YAML -> database.json)Working

Future Directions

Several areas are under consideration for extending the KB system:

  • Broader entity coverage: Expanding beyond the current 362 entities to cover more of the wiki's ~700 pages with structured facts
  • Richer record collection types: Adding more collection schemas (e.g., regulatory actions, benchmark results, governance events) to entity types beyond organization
  • Temporal queries in components: Enabling \<KBF\> to reference facts at specific dates (e.g., "Anthropic's revenue as of 2025-03") rather than always showing the latest
  • Cross-entity computed properties: Supporting \<Calc\> expressions that aggregate across entities (e.g., total funding across all AI safety organizations)
  • Statement-KB reconciliation: Automated checks that verify page prose (statements) stays consistent with canonical KB facts, flagging contradictions for human review
  • Property completeness tracking: Dashboard showing which entity types have the best coverage and where data gaps exist