Knowledge Base Architecture
Overview
The Knowledge Base (KB) is the wiki's structured data system. It stores curated facts about entities in YAML files, validates them against typed schemas, and feeds them into the wiki's build pipeline where they power inline components, hover tooltips, comparison tables, and entity sidebar panels.
The KB lives in packages/kb/ and consists of:
- 362 entity files in
packages/kb/data/things/*.yaml(organizations, people, AI models, risks, etc.) - 95 properties defined in
packages/kb/data/properties.yamlwith display formatting, type constraints, and inverse relationships - 19 entity type schemas in
packages/kb/data/schemas/*.yamlspecifying required and recommended properties plus record collection definitions - 23 validation rules enforced at build time
Facts flow through the wiki as: YAML -> Graph (loader.ts) -> serialize -> database.json -> Next.js reads at build time. No runtime API calls are needed for structured data on wiki pages.
Core Data Model
Entities
Each entity is a YAML file in packages/kb/data/things/. The file contains an entity header, a list of facts, and optional record collections:
thing:
id: anthropic # Human-readable slug
stableId: mK9pX3rQ7n # Random 10-char ID that survives renames
type: organization # References a TypeSchema
name: Anthropic
numericId: "E22" # Legacy wiki page ID
aliases: [Anthropic PBC, Anthropic AI]
facts:
- id: f_rev_2026_02
property: revenue
value: 14e9
asOf: 2026-02
source: https://www.reuters.com/...
sourceResource: 8e3ff50b9ef2a1a8
notes: "Run-rate revenue at Series G announcement"
items:
funding-rounds:
type: funding-round
entries:
i_OVNz9C3XUA:
date: 2026-02
amount: 30e9
valuation: 380e9
lead_investor: gic
source: https://www.reuters.com/...
Facts
A fact is a typed, source-attributed assertion about an entity. The Fact interface supports:
| Field | Purpose |
|---|---|
id | Unique identifier (prefixed f_ for authored, inv_ for computed inverses) |
property | References a property in the registry (e.g., revenue, founded-by) |
value | Typed: number, text, date, boolean, ref, refs, range, min, or json |
asOf | When this was true (ISO date: YYYY, YYYY-MM, or YYYY-MM-DD) |
validEnd | When this stopped being true (omit if still current) |
source | URL to the primary source |
sourceResource | ID linking to the wiki's curated resource registry |
sourceQuote | Relevant excerpt from the source |
notes | Free-text annotation explaining caveats or methodology |
currency | ISO 4217 override for non-USD monetary values (e.g., GBP) |
usdEquivalent | Approximate USD conversion for cross-currency comparison |
Properties
Properties are the taxonomy that gives facts meaning. Defined in packages/kb/data/properties.yaml, each property specifies:
- Data type: number, text, date, boolean, ref, refs
- Unit: USD, percent, tokens, employees, etc.
- Category: financial, people, safety, model, risk, policy, etc.
- Display config: divisor, prefix, suffix for human-readable formatting (e.g.,
$14.0Bfrom14e9) - Temporal flag: whether values change over time (revenue, headcount) vs. are static (founded-date)
- Applies-to constraint: which entity types can use this property
- Inverse relationships:
employed-byautomatically generatesemployer-ofon the referenced entity
There are currently 95 properties across 21 categories, supporting entity types from organizations and people to AI models, risks, policies, and incidents.
Inverse Relationships
When a property declares an inverseId, the KB automatically computes mirror facts. For example:
employed-byon a person entity (e.g., Jan Leike -> Anthropic) generates anemployer-offact on the organization entity (Anthropic -> Jan Leike)founded-byon an organization generatesfounder-ofon each referenced persondeveloped-byon an AI model generatesdeveloped-modelson the organization
Inverse facts are content-addressed (deterministic IDs based on a SHA-256 hash of subject + property + value) and carry the derivedFrom field pointing to their source fact. The computed: true flag on the inverse property prevents authors from manually storing both sides.
Type Schemas
Each entity type has a schema in packages/kb/data/schemas/ that defines:
- Required properties: must have at least one fact (validation error if missing)
- Recommended properties: should have facts (validation info if missing)
- Record collection schemas: structured sub-data with typed fields
For example, the organization schema recommends founded-date, headquarters, revenue, valuation, headcount, legal-structure, total-funding and defines record collections for funding-rounds, key-people, products, model-releases, board-members, strategic-partnerships, safety-milestones, research-areas, and grants-and-programs.
The 19 entity types with schemas are: ai-model, analysis, approach, argument, capability, case-study, concept, debate, event, funder, historical, incident, organization, person, policy, project, risk, risk-factor, and safety-agenda.
Record Collections
Record collections store ordered sub-data that does not fit the simple property-value model. They are defined per entity type in the schema and stored inline in each entity's YAML file.
Examples:
| Collection | Entity Type | Fields |
|---|---|---|
funding-rounds | organization | date, amount, valuation, lead_investor, source, notes |
key-people | organization | person (ref), title, start, end, is_founder, source, notes |
products | organization | name, launched, description, source, notes |
model-releases | organization | name, released, description, safety_level, source, notes |
board-members | organization | name, role, appointed, departed, background, appointed_by, source |
safety-milestones | organization | name, date, type, description, source, notes |
Each entry has a stable key (e.g., i_OVNz9C3XUA) and typed fields validated against the schema. Fields can reference other entities via !ref YAML tags (e.g., person: !ref zR4nW8xB2f:dario-amodei), which are cross-validated during loading.
MDX Components
The KB powers several MDX components used in wiki pages:
<KBF> - Inline Fact Display
Renders a single KB fact with a hover tooltip showing source, date, and notes:
<KBF entity="anthropic" property="valuation" /> {/* Auto-formatted: "$380.0B" */}
<KBF entity="anthropic" property="revenue" showDate /> {/* "$19.0B (as of 2026-03)" */}
<KBF entity="anthropic" property="valuation">$380B</KBF> {/* Custom display text */}
<KBFactValue> - Block Fact Display
Renders a fact value with more context, used in structured sections.
<Calc> - Computed Values
Derives a value from multiple facts at render time:
<Calc expr="{anthropic.valuation} / {anthropic.revenue}" precision={0} suffix="x" />
KB Footnotes
Facts referenced via <KBF> automatically generate footnote citations using the [^1] format. These appear in the page's footnote section with full source attribution.
Entity Sidebar and Fact Tables
Entity pages include an auto-generated sidebar panel showing key facts (KBEntitySidebar) and comparison tables (KBCompareTable, KBFactTable) for cross-entity analysis.
Validation
The KB runs 23 validation checks at build time, organized into three severity levels:
Errors (block build):
- Required properties missing
- Ref/refs point to nonexistent entities
- StableId format invalid (must be 10 alphanumeric chars)
- Fact ID format invalid (must start with
f_orinv_) - Entity has empty name
- validEnd is earlier than asOf
- Duplicate stableIds across entities
- Record collection field type mismatches or missing required fields
- Range value bounds invalid
Warnings:
- Recommended properties missing
- Property used on wrong entity type
- Temporal property missing asOf date
- Non-temporal property has multiple facts
- Stale temporal data (>2 years old for financial/product categories)
- Duplicate facts (same entity + property + asOf)
- Missing source URL
- Unknown property ID
- Invalid date format
- Future asOf date
- Bidirectional redundancy (both sides of an inverse stored)
- Unknown currency code
Informational:
- Orphan entity (no facts and no records)
- Completeness percentage per entity
Data Pipeline
The KB integrates into the wiki's build pipeline:
- Load:
loadKB()reads YAML files using a two-pass approach - first pass builds the stableId index, second pass resolves!reftags and loads facts/items - Compute inverses:
computeInverses()generates mirror facts for all properties withinverseId - Validate:
validate()runs all 23 checks across the graph - Serialize:
serialize()converts the in-memory Graph to a plain JSON object - Build-data:
apps/web/scripts/build-data.mjsmerges KB data intodatabase.json - Render: Next.js reads
database.jsonat build time; MDX components look up facts by entity + property
The in-memory Graph class provides a rich query API: getEntity(), getFacts(), getLatest(), getByProperty(), getRelated(), getItems(), getItemsMentioning().
Relationship to Statements in PostgreSQL
The wiki-server's PostgreSQL database stores statements (formerly called "claims") - text-level assertions extracted from wiki page prose. These are complementary to KB facts, not a replacement:
| Dimension | KB Facts (YAML) | Statements (PostgreSQL) |
|---|---|---|
| Purpose | Curated canonical data | Page text analysis |
| Storage | YAML files in git | PostgreSQL |
| Authoring | Manual curation | LLM-extracted from prose |
| Data types | Typed (number, date, ref, range, etc.) | Free text with optional structured fields |
| Temporal | asOf / validEnd | valueDate |
| Source tracking | source, sourceResource, sourceQuote | claim_sources join table |
| Display | <KBF>, <Calc>, tooltips | Statements table on entity pages |
| Build dependency | Yes (feeds database.json) | No (runtime queries) |
KB facts are the authoritative source for structured data that appears across wiki pages. Statements provide a complementary view of what the prose says, useful for citation checking, hallucination detection, and verifying consistency between page text and canonical data.
Current Status
| Component | Status |
|---|---|
| KB YAML data layer (362 entities) | Working |
| Property taxonomy (95 properties) | Working |
| 19 entity type schemas with record collections | Working |
| 23 validation rules | Working |
| Inverse relationship computation | Working |
!ref and !date YAML tags with cross-validation | Working |
| Currency support with USD equivalents | Working |
\<KBF\> inline fact display with tooltips | Working |
\<KBFactValue\> block display | Working |
\<Calc\> computed values | Working |
| KB footnotes integration | Working |
| Entity sidebar and fact tables | Working |
Comparison tables (KBCompareTable) | Working |
| Build pipeline integration (YAML -> database.json) | Working |
Future Directions
Several areas are under consideration for extending the KB system:
- Broader entity coverage: Expanding beyond the current 362 entities to cover more of the wiki's ~700 pages with structured facts
- Richer record collection types: Adding more collection schemas (e.g., regulatory actions, benchmark results, governance events) to entity types beyond organization
- Temporal queries in components: Enabling
\<KBF\>to reference facts at specific dates (e.g., "Anthropic's revenue as of 2025-03") rather than always showing the latest - Cross-entity computed properties: Supporting
\<Calc\>expressions that aggregate across entities (e.g., total funding across all AI safety organizations) - Statement-KB reconciliation: Automated checks that verify page prose (statements) stays consistent with canonical KB facts, flagging contradictions for human review
- Property completeness tracking: Dashboard showing which entity types have the best coverage and where data gaps exist