Wiki Generation Architecture: Multi-Agent Multi-Pass Design
This is an architecture proposal, not a description of the current system. Some elements have been partially implemented (section-level rewriting, KB fact system, basic citation verification), but the full multi-agent orchestrator has not been built. The KB system (packages/kb/) now provides the structured data layer that this architecture assumes as a prerequisite — structured facts in YAML, <KBF> for inline values, and <Calc> for computed values.
Executive Summary
See also: The Claim-First Wiki Architecture proposal (removed) was a companion proposal that inverted the data model, making verified atomic claims the primary artifact. That proposal's structured-data ideas have been partially superseded by the KB system (packages/kb/), which provides entity-level structured facts in YAML, the <KBF> component for inline fact values, and <Calc> for computed values.
Our current page generation pipeline (Crux content create/improve) is a single-pipeline, single-agent system. It works, but produces pages that are adequate rather than excellent. The best wiki pages require depth that a single LLM pass cannot achieve: dense cross-linking, verified citations, complex diagrams, embedded calculations, and knowledge graph coherence.
This document proposes a multi-agent, multi-pass architecture inspired by Stanford's STORM, Microsoft's GraphRAG, CrewAI's specialist agent patterns, and the Self-Refine iterative paradigm. The core idea: decompose page generation into composable passes, each executed by a specialist agent optimized for one concern.
| Current System | Proposed System |
|---|---|
| Single synthesis prompt | 12+ composable passes |
| One LLM does everything | 8 specialist agents |
| Research then write (2 phases) | Research, structure, write, link, verify, compute, diagram, review (8+ phases) |
| Knowledge graph consulted at link time | Knowledge graph drives content planning |
| Static calculations | Dynamic Squiggle models derived from wiki data |
| Post-hoc validation | Validation integrated into each pass |
| $4-15 per page | $8-25 per page (higher quality ceiling) |
Part 1: Problems with the Current System
What We Have
The current pipeline (crux/authoring/page-creator.ts) follows this flow:
canonical-links -> research -> source-fetching -> synthesis -> verification -> validation -> grade
This produces pages scoring 70-80/100 on our grading rubric. The pipeline has been iterated significantly (see the Page Creator Pipeline report) and represents solid work. But it has structural limitations:
Limitation 1: Single-Agent Synthesis Bottleneck
One Claude call synthesizes the entire article from research. This means the model must simultaneously:
- Write coherent prose
- Place citations correctly
- Decide which EntityLinks to use
- Structure sections per template
- Include appropriate tables and diagrams
- Maintain balanced perspective
No single prompt can optimize all of these. The result: pages that are structurally correct but lack depth in cross-linking, calculations, and visual elements.
Limitation 2: Knowledge Graph is Read-Only
The current system consults the entity database to resolve EntityLinks, but doesn't use the knowledge graph to plan content. A page about "deceptive alignment" should proactively cover its graph neighbors (situational awareness, mesa-optimization, sleeper agents) with appropriate depth. Currently, this happens only if the LLM independently decides to mention them.
Limitation 3: No Iterative Deepening
The pipeline runs once. If the synthesis phase produces a page with weak sections, those sections stay weak. The review phase in the improver can identify gaps, but the fix is another monolithic LLM call. There's no mechanism for targeted, section-level improvement.
Update (Feb 2026): The --section-level flag (pnpm crux content improve <id> --section-level) now implements per-section rewriting: the page is split on ## headings, each section rewritten independently via rewriteSection(), then reassembled with renumbered footnotes. See crux/lib/section-splitter.ts and crux/authoring/page-improver/phases/improve-sections.ts. This addresses the "targeted improvement" limitation above; the deeper limitations (graph-aware planning, diagram agents) remain future work.
Limitation 4: Diagrams and Calculations are Afterthoughts
Mermaid diagrams and Squiggle models are included only if the synthesis prompt happens to produce them. There's no dedicated agent reasoning about what visual or computational elements would add value, and no agent that specializes in producing high-quality versions of these.
Limitation 5: Cross-Linking is Shallow
EntityLinks are added during synthesis, then validated. But the system doesn't reason about the topology of links: which inbound links should this page attract? Which pages should link to this one? A new page about "compute governance" should trigger updates to pages about "compute thresholds," "chip export controls," and "training run monitoring."
Part 2: State of the Art
Stanford STORM (2024)
STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking) is the closest academic system to what we need. Key innovations:
| Innovation | How It Works | Relevance to Us |
|---|---|---|
| Perspective-guided research | Discovers multiple perspectives by surveying similar articles, then simulates conversations from each perspective | We could mine perspectives from our existing 625 pages on related topics |
| Simulated expert conversations | A "writer" agent asks questions to a "topic expert" agent grounded in search results. Follow-up questions arise naturally | Better than our "dump all research into one synthesis prompt" approach |
| Two-stage pipeline | Pre-writing (research + outline) is separated from writing. Outline quality correlates with article quality | We already do this loosely; could formalize it |
| Co-STORM mind map | Organizes collected information into a hierarchical concept structure updated throughout the process | Maps to our entity graph, but dynamically maintained during authoring |
Key finding: STORM articles were rated 25% better organized and 10% broader in coverage than baseline RAG approaches by Wikipedia editors.
Limitation: STORM produces Wikipedia-style articles but doesn't handle our specific requirements: EntityLinks, Squiggle models, Mermaid diagrams, YAML entity synchronization, or the frontmatter/grading system.
Microsoft GraphRAG (2024)
GraphRAG extends RAG with knowledge graph structure. Instead of retrieving text chunks, it retrieves subgraphs -- entities, relationships, and community summaries.
| Innovation | How It Works | Relevance to Us |
|---|---|---|
| Community detection | Clusters related entities and generates hierarchical summaries | We could use this to identify which entities a new page should cover |
| Global search via map-reduce | Pre-generates community summaries, then runs map-reduce across them for corpus-wide questions | Useful for "what's the relationship between X and all its neighbors?" |
| Entity extraction pipeline | Extracts entities and relationships from text, builds graph | We already have this (YAML entities + content scanning), but could improve |
Key finding: GraphRAG dramatically outperforms naive RAG on multi-hop reasoning and synthesis questions. Exactly the kind of reasoning needed for wiki cross-linking.
CrewAI Specialist Agent Pattern (2025)
CrewAI demonstrates that splitting work across specialist agents with clear handoff contracts produces better results than one mega-agent.
The pattern: Researcher -> Writer -> Editor -> Specialist, with each agent optimized for its role (different system prompts, different tools, potentially different models).
Key insight from CrewAI: "Squeezing too much into one agent causes context windows to blow up, too many tools confuse it, and hallucinations increase." This directly explains our synthesis bottleneck.
Self-Refine (Madaan et al., 2023)
Self-Refine demonstrates that iterative generate -> feedback -> refine loops improve LLM output by ~20% on average. The key: the same model generates, critiques, and refines, but with different prompts for each role.
Key finding: The refine loop works best when feedback is specific and actionable (not "make it better" but "paragraph 3 lacks a citation for the 40% claim"). This maps to our validation rules, which already produce specific, fixable issues.
SemanticCite (2025)
SemanticCite proposes a pipeline for citation verification: extract claims, retrieve source passages via hybrid search, classify support level (SUPPORTED / PARTIALLY SUPPORTED / UNSUPPORTED / UNCERTAIN). Their fine-tuned models achieve competitive performance with commercial systems.
Relevance: We already have a verify-sources phase, but it's coarse-grained. Per-claim verification with confidence scoring would significantly improve citation quality.
Anthropic Multi-Agent Research System (2025)
Anthropic's own research system uses an orchestrator-worker pattern: a lead agent analyzes a query, develops strategy, and spawns subagents to explore different aspects in parallel. Multi-agent Opus + Sonnet outperformed single-agent Opus by 90.2% on their research eval.
Key insight: Use expensive models (Opus) for orchestration and synthesis, cheap models (Sonnet/Haiku) for parallel research and extraction. This is exactly the cost structure we should adopt.
Part 3: Proposed Architecture
Core Principle: Composable Passes
Instead of a monolithic pipeline, we define passes that can be composed in different orders depending on the page type, tier, and goals. Each pass:
- Takes a well-defined input (page draft + metadata)
- Produces a well-defined output (modified draft + metadata)
- Is idempotent (running it twice produces the same result)
- Has a cost estimate
- Can be run independently for debugging
The 8 Specialist Agents
Each agent has a focused role, specific tools, and an optimal model choice:
| # | Agent | Role | Model | Tools | Cost/Page |
|---|---|---|---|---|---|
| 1 | Orchestrator | Plans strategy, schedules passes, checks quality gates | Opus | All agents, quality scorer | $1-2 |
| 2 | Researcher | Web search, academic search, source fetching | Sonnet | Perplexity, SCRY, Firecrawl | $1-3 |
| 3 | Graph Analyst | Analyzes knowledge graph neighbors, plans cross-links | Sonnet | Entity DB, backlinks, graph data | $0.50-1 |
| 4 | Structurer | Generates outlines, ensures template compliance | Sonnet | Page templates, existing page analysis | $0.50-1 |
| 5 | Writer | Section-by-section prose synthesis from research | Opus | Research output, entity lookup | $2-4 |
| 6 | Enricher | Creates diagrams, Squiggle models, tables | Sonnet | Mermaid validator, Squiggle runtime | $1-2 |
| 7 | Verifier | Citation checking, EntityLink resolution, fact validation | Haiku | Source DB, validation engine | $0.25-0.50 |
| 8 | Reviewer | Identifies gaps, bias, weak sections; triggers re-passes | Opus | Quality rubric, template checker | $1-2 |
Total estimated cost: $8-16 for standard tier (vs $4-6 currently). The quality ceiling is substantially higher.
The 12+ Composable Passes
Research Passes
Pass R1: Perspective Discovery
- Input: Topic title + entity type
- Process: Survey our existing pages on related topics. What perspectives do they cover? What's missing? (Inspired by STORM's perspective mining)
- Output: List of 5-10 perspectives to investigate (e.g., for "compute governance": technical feasibility, political economy, international coordination, industry self-regulation, civil liberties)
- Agent: Graph Analyst
- Cost: $0.25
Pass R2: Multi-Source Research
- Input: Topic + perspectives list
- Process: For each perspective, run targeted Perplexity queries. Fetch and register sources.
- Output:
research.jsonwith categorized findings per perspective - Agent: Researcher
- Cost: $1-3
Pass R3: Graph Neighbor Analysis
- Input: Topic + entity database
- Process: Identify all entities within 2 hops in the knowledge graph. Analyze which are most relevant and what relationship labels apply. Determine which existing pages should link to this new page.
- Output:
graph-context.jsonwith neighbor entities, relationship types, and suggested inbound link updates - Agent: Graph Analyst
- Cost: $0.50
Pass R4: Existing Content Analysis
- Input: Topic + similar pages (from redundancy detection)
- Process: Read the top 5 most similar existing pages. Identify what this page should cover that they don't, and what it can reference rather than repeat.
- Output:
content-gap.jsonwith unique angles and cross-references - Agent: Graph Analyst
- Cost: $0.50
Structure Passes
Pass S1: Outline Generation
- Input: Research output + template + graph context
- Process: Generate a detailed section-by-section outline with word count targets and required elements per section (tables, citations, diagrams)
- Output:
outline.jsonwith sections, subsections, planned elements - Agent: Structurer
- Cost: $0.50
Pass S2: Knowledge Graph Planning
- Input: Outline + entity database
- Process: For each section, identify which EntityLinks should appear. Plan where Squiggle models and diagrams will go. Identify facts to extract to YAML.
- Output: Enriched outline with EntityLink targets, diagram specs, computation specs
- Agent: Graph Analyst
- Cost: $0.50
Content Passes
Pass C1: Section-by-Section Synthesis
- Input: Outline + research + graph context (one section at a time)
- Process: Write each section independently, using only the research relevant to that section. Enforce citation discipline per section.
- Output: Draft page with all sections assembled
- Agent: Writer
- Cost: $2-4
This is the biggest departure from the current system. Instead of one synthesis call, we write section by section. Each section gets a focused context window with only the relevant research, entity lookups, and template requirements. This prevents context window overload and ensures each section gets full attention.
Pass C2: Citation Placement
- Input: Draft page + source database
- Process: Verify every factual claim has a citation. Add missing citations from the source database. Convert inline URLs to
<R>components where sources exist. - Output: Fully cited draft
- Agent: Verifier
- Cost: $0.25
Pass C3: EntityLink Enrichment
- Input: Draft + entity database
- Process: Scan for entity mentions that lack EntityLinks. Add
<EntityLink>components for all resolvable entities. Ensure link density meets template requirements. - Output: Cross-linked draft
- Agent: Graph Analyst
- Cost: $0.25
Enrichment Passes
Pass E1: Diagram Generation
- Input: Draft + outline diagram specs
- Process: For each planned diagram location, generate a Mermaid diagram that visualizes the concept. Validate syntax. Follow Mermaid style guide (max 15-20 nodes, flowchart TD, proper colors).
- Output: Draft with embedded diagrams
- Agent: Enricher
- Cost: $0.50-1
Pass E2: Computation Embedding
- Input: Draft + facts database + graph data
- Process: Identify quantitative claims that could be dynamic. Create Squiggle models that compute from wiki data (KB facts in
packages/kb/data/things/, entity metrics). Embed<SquiggleEstimate>components. - Output: Draft with dynamic computations
- Agent: Enricher
- Cost: $0.50-1
Pass E3: Table Structuring
- Input: Draft
- Process: Identify data that's better presented as tables. Ensure tables have proper headers, sourced data, and comparative structure. Enforce the "max 4 tables, tables are for genuinely comparative data" rule.
- Output: Draft with optimized tables
- Agent: Enricher
- Cost: $0.25
Pass E4: Fact Extraction
- Input: Draft + existing KB facts
- Process: Extract key quantitative claims from the page and propose additions to KB facts in
packages/kb/data/things/. Link computed facts to their source pages via<KBF>references. - Output: Proposed KB fact entries + draft with
<KBF>references - Agent: Enricher
- Cost: $0.25
- Status: The KB system now partially serves this role. Structured facts for 360+ entities exist in
packages/kb/data/things/*.yaml, with properties defined inpackages/kb/data/properties.yaml. Pages can reference these via<KBF>components and[^1]footnotes. The automated extraction pipeline (proposing new facts from page content) has not been built.
Verification Passes
Pass V1: Citation Verification
- Input: Draft + source database
- Process: For each citation, verify the claim is actually supported by the source. Classify as SUPPORTED / PARTIALLY_SUPPORTED / UNSUPPORTED. Flag unsupported claims.
- Output: Verification report + flagged claims
- Agent: Verifier (inspired by SemanticCite)
- Cost: $0.25-0.50
Pass V2: Validation Rules
- Input: Draft
- Process: Run the full validation engine (dollar signs, comparison operators, frontmatter schema, EntityLink IDs, etc.). Auto-fix where possible.
- Output: Clean draft passing all blocking rules
- Agent: Verifier
- Cost: $0.10
Pass V3: Self-Review
- Input: Draft + template + quality rubric
- Process: Grade the page against the template rubric. Identify weak sections with specific, actionable feedback. Determine if another pass through content/enrichment is needed.
- Output: Review report with per-section scores and improvement suggestions
- Agent: Reviewer (inspired by Self-Refine)
- Cost: $1-2
Iterative Refinement Loop
The Reviewer (V3) can trigger re-execution of specific passes:
The orchestrator limits iterations (default: 2 refinement cycles) to control cost. Each cycle targets only the specific passes needed.
Part 4: Tier Configurations
Budget Tier ($5-8)
For drafts and low-importance pages:
R2(lite) -> S1 -> C1 -> V2 -> V3(single)
5 passes, no enrichment, no iterative refinement. Produces a well-structured, cited article without diagrams or calculations.
Standard Tier ($12-18)
For most pages:
R1 -> R2 -> R3 -> S1 -> S2 -> C1 -> C2 -> C3 -> E1 -> E3 -> V1 -> V2 -> V3 -> [1 refinement cycle]
13+ passes including diagrams, cross-linking, and one refinement cycle. The expected quality ceiling.
Premium Tier ($20-30)
For high-importance or controversial pages:
R1 -> R2(deep) -> R3 -> R4 -> S1 -> S2 -> C1 -> C2 -> C3 -> E1 -> E2 -> E3 -> E4 -> V1 -> V2 -> V3 -> [2 refinement cycles]
All passes including Squiggle models, fact extraction, content gap analysis, and two refinement cycles.
Polish Tier ($3-5)
For improving existing pages (replaces current crux content improve):
R3 -> C3 -> E1(if missing) -> V1 -> V2 -> V3
Focuses on cross-linking, enrichment gaps, and citation verification. Doesn't rewrite prose.
Part 5: Knowledge Graph Integration
Graph-Driven Content Planning
The biggest architectural shift: the knowledge graph drives content creation rather than being consulted as an afterthought.
Bidirectional Link Updates
When a new page is created, the system should also propose updates to existing pages that should link to it. The Graph Analyst agent:
- Identifies entities in the graph that relate to the new page
- Reads existing pages for those entities
- Identifies natural insertion points for EntityLinks to the new page
- Produces a
link-updates.jsonwith proposed edits
This transforms page creation from an isolated act into a graph maintenance operation.
Community Summaries (GraphRAG-inspired)
For entity clusters (e.g., all "alignment approaches"), maintain pre-computed community summaries that:
- Describe the cluster's theme
- List key entities and their relationships
- Identify gaps (entities that exist but lack pages)
These summaries can be used by the Writer agent as context when synthesizing content that touches on a cluster.
Part 6: Dynamic Computation Embedding
The Vision
Partial implementation note: The <Calc> and <KBF> (KB Fact) components now partially implement the "computation embedding" vision described here. <KBF> pulls live values from KB YAML files in packages/kb/data/things/, and <Calc> computes derived values from those facts. The Squiggle-based uncertainty modeling described below remains aspirational.
Wiki pages shouldn't just state numbers -- they should compute them from the knowledge base.
Example: A page about "AI lab safety spending" could include:
<SquiggleEstimate
title="Estimated AI Safety Spending (2025)"
code={`
anthropicRevenue = 2B to 3.5B
openaiRevenue = 3B to 5B
deepmindBudget = 1.5B to 2.5B
safetyFraction = {
anthropic: 0.15 to 0.25,
openai: 0.05 to 0.12,
deepmind: 0.10 to 0.20
}
totalSafetySpending = anthropicRevenue * safetyFraction.anthropic
+ openaiRevenue * safetyFraction.openai
+ deepmindBudget * safetyFraction.deepmind
`}
/>
How the Enricher Agent Creates These
- Identify computational opportunities: Scan the draft for quantitative claims that involve estimation or aggregation
- Pull from KB facts: Use existing KB fact values from
packages/kb/data/things/as inputs where available - Create Squiggle models: Write distribution-based models (never point estimates) following the Squiggle style guide
- Validate: Run the Squiggle code to ensure it executes without errors
- Embed: Place
<SquiggleEstimate>components at appropriate locations
Fact Feedback Loop
The Enricher can also propose new facts to the KB (packages/kb/data/things/) based on claims in the page, creating a feedback loop where page content enriches the data layer, which in turn feeds future computations.
Part 7: Implementation Plan
Phase 1: Pass Infrastructure (Foundation)
Build the composable pass system on top of existing Crux infrastructure:
interface Pass {
id: string;
name: string;
agent: AgentType;
// Input/output contract
requires: string[]; // IDs of passes that must run first
produces: string[]; // Artifact keys this pass creates
// Execution
execute(context: PassContext): Promise<PassResult>;
// Cost estimation
estimateCost(context: PassContext): number;
}
interface PassContext {
topic: string;
entityType: string;
tier: TierConfig;
// Accumulated artifacts from prior passes
artifacts: Map<string, any>;
// Shared resources
entityDb: EntityDatabase;
sourceDb: SourceDatabase;
validationEngine: ValidationEngine;
}
This leverages the existing Crux validation engine, entity lookup, and source database. Each pass is a module in crux/authoring/passes/.
Phase 2: Specialist Agents
Implement agents as wrappers around Claude API calls with focused system prompts:
interface Agent {
id: AgentType;
model: 'opus' | 'sonnet' | 'haiku';
systemPrompt: string;
tools: Tool[];
run(input: AgentInput): Promise<AgentOutput>;
}
Start with the Writer and Verifier agents (biggest impact), then add Graph Analyst and Enricher.
Phase 3: Orchestrator
Build the orchestrator that plans pass sequences and manages quality gates:
class Orchestrator {
planPasses(topic: string, tier: Tier, entityType: string): Pass[];
executePlan(passes: Pass[], context: PassContext): Promise<Page>;
checkQualityGate(page: Page, template: Template): QualityResult;
planRefinement(review: ReviewResult): Pass[];
}
Phase 4: Graph Integration
Add bidirectional link updates and community summaries. This requires changes to build-data.mjs to compute community clusters and maintain summary cache.
Phase 5: Computation Embedding
Add the Squiggle model generation pass. Requires integration with the Squiggle runtime for validation.
Migration Path
The new system can coexist with the current pipeline:
# Current system (preserved)
pnpm crux content create "Topic" --tier=standard
# New system (opt-in)
pnpm crux content create "Topic" --tier=standard --engine=v2
# Eventually
pnpm crux content create "Topic" --tier=standard # defaults to v2
Part 8: Comparison with Alternatives
Alternative A: Keep Improving Current Pipeline
Pros: Lower risk, incremental improvement, already works.
Cons: Structural ceiling -- single-agent synthesis can't produce the cross-linking, computation, and diagram density we want. Diminishing returns on prompt engineering.
Alternative B: Adopt STORM Directly
Pros: Battle-tested, open source, good research quality.
Cons: Python-only (we're TypeScript), no support for EntityLinks/Squiggle/Mermaid/YAML entities, no knowledge graph integration, would require heavy forking. The research stage is valuable but the writing stage doesn't match our needs.
Alternative C: Full CrewAI/LangGraph Framework
Pros: Rich agent orchestration, built-in patterns for sequential/parallel execution.
Cons: Heavy framework dependency, Python ecosystem, abstractions don't map cleanly to our YAML-first data model. We'd spend more time fighting the framework than building features.
Alternative D: Claude Agent SDK Multi-Agent (Proposed Approach)
Pros: Same TypeScript ecosystem, direct Anthropic API integration, subagent orchestration built-in, proven at scale (90.2% improvement over single-agent in Anthropic's own eval). Builds on existing Crux infrastructure.
Cons: Higher implementation effort than Alternative A, less battle-tested than STORM for research quality.
Recommendation: Alternative D, borrowing specific ideas from STORM (perspective-guided research, simulated conversations) and GraphRAG (community summaries, graph-driven planning).
Part 9: Key Ideas Borrowed
| Source | Idea | How We Use It |
|---|---|---|
| STORM | Perspective-guided question asking | Pass R1 discovers perspectives from existing wiki pages |
| STORM | Simulated expert conversations | Writer agent can simulate researcher/expert dialogue |
| STORM | Pre-writing/writing separation | Passes R1-S2 are pre-writing; C1+ is writing |
| GraphRAG | Community summaries | Pre-computed cluster summaries for entity groups |
| GraphRAG | Subgraph retrieval | Graph Analyst retrieves relevant subgraph, not just entity list |
| CrewAI | Specialist agents with handoff contracts | 8 agents with typed input/output |
| CrewAI | Sequential pipeline with clear boundaries | Pass dependency graph |
| Self-Refine | Generate-feedback-refine loop | V3 reviewer triggers targeted re-passes |
| Self-Refine | Specific, actionable feedback | Reviewer produces per-section scores, not vague "improve" |
| SemanticCite | Per-claim citation verification | V1 classifies each claim's support level |
| Anthropic Research | Orchestrator + parallel subagents | Orchestrator plans, specialists execute (potentially in parallel) |
| Anthropic Research | Expensive orchestrator, cheap workers | Opus for orchestrator/writer/reviewer, Sonnet/Haiku for research/verification |
Part 10: Success Metrics
How we'll know the new architecture is working:
| Metric | Current | Target | How Measured |
|---|---|---|---|
| Average page quality score | 70-78 | 82-90 | Template grading rubric |
| EntityLinks per page | 5-10 | 15-25 | Metrics extractor |
| Citations per page | 35-42 | 40-60 | Footnote count |
| Diagrams per page | 0-1 | 1-3 | Metrics extractor |
| Squiggle models per page | 0 | 0-2 (where applicable) | Component count |
| Inbound links created | 0 | 3-5 per new page | Bidirectional link updates |
| Facts extracted to YAML | 0 | 2-5 per new page | Fact extraction pass |
| Cost per standard page | $4-6 | $12-18 | API cost tracking |
The cost increase is intentional. We're trading $8-12 more per page for substantially higher quality. At 625 pages, even regenerating the entire wiki would cost $7,500-11,000 -- a one-time investment.
Part 11: Composable Module Architecture (February 2026 Refinement)
The original proposal (Parts 3-7) frames the system as specialist agents with composable passes. A further refinement: the passes themselves should be reusable modules — independent tools that compose in the improve pipeline, auto-update, page creation, and standalone CLI commands.
Core Insight: Modules as Agent Tools
Instead of a fixed pipeline where passes run in a predetermined sequence, the orchestrator is an LLM agent that has modules available as tools and decides what to call based on what the page actually needs:
Agent reads page → analyzes gaps → calls tools → checks result → iterates
A page with good prose but no diagrams gets diagram tools. A page with bad sourcing gets research + citation tools. The agent adapts to the page rather than running a fixed sequence.
The Module Kit
Research & Grounding:
| Module | Purpose | Standalone CLI | Cost |
|---|---|---|---|
source-fetcher | Fetch URL → clean markdown + relevant excerpts | crux citations verify (upgraded) | $0 (no LLM) |
research-agent | Multi-source search → structured facts with quotes | crux research <topic> | $1-3 |
source-cache | Persistent store of fetched sources + extracted facts | Reused across runs | $0 |
claim-verifier | Check if source supports a specific claim | Per-citation in auditor | $0.01/claim (Haiku) |
citation-auditor | Independent verification of all citations on a page | crux citations audit <id> | $0.10-0.30 |
Content Writing:
| Module | Purpose | Standalone CLI | Cost |
|---|---|---|---|
rewrite-section | Rewrite ONE section, constrained to source cache | Used by orchestrator | $0.10-0.30/section |
Enrichment:
| Module | Purpose | Standalone CLI | Cost |
|---|---|---|---|
add-entity-links | Insert EntityLink components for mentioned entities | crux enrich entity-links <id> | $0.05 (Haiku) |
add-fact-refs | Wrap hardcoded numbers in <KBF> tags referencing KB facts | crux enrich fact-refs <id> | $0.05 (Haiku) |
add-diagram | Generate Mermaid diagram for a section | crux enrich diagram <id> | $0.10-0.20 |
add-squiggle | Add uncertainty modeling for a section | crux enrich squiggle <id> | $0.10-0.20 |
Why Section-Level Matters
The current pipeline rewrites 2,000-4,000 words in one LLM call. The prompt must simultaneously handle prose, citations, EntityLinks, Facts, Calc, escaping, and structure. Enrichments compete for attention.
With section-level rewrite-section:
- Focused context: LLM handles 200-500 words at a time
- Better grounding: only relevant sources for that section
- Isolated enrichments: adding a diagram can't break citations
- Partial progress: if budget runs out, 3 improved sections beats nothing
Source-Fetcher as Foundation
The single most important primitive. Currently, the pipeline never reads cited URLs — it gets search snippets and trusts the LLM to cite accurately. The source-fetcher creates ground truth by actually fetching and caching source content.
This unlocks:
- Citation auditor: can verify claims against actual source text
- Grounded writer: can be constrained to only cite from fetched+cached sources
- Claim map: mechanical link from "claim in output" → "quote in fetched source"
- Existing
crux citations verify: upgrades from "is URL alive?" to "does URL support the claim?"
Claim Map: Mechanical Grounding Contract
The rewrite-section module outputs an explicit claim map alongside the content:
{
"content": "...improved section MDX...",
"claimMap": [
{ "claim": "Anthropic raised \$4B in 2024", "factId": "f-012", "sourceUrl": "https://..." },
{ "claim": "Founded in 2021 by Dario Amodei", "factId": "f-003", "sourceUrl": "https://..." }
],
"ungroundedClaims": ["Anthropic is widely considered a leader in AI safety"]
}
This is mechanically verifiable: check that every factId exists in the source cache and the claim matches the extracted quote. Ungrounded claims are flagged for human review or removal.
Budget as Tool-Call Limits
Instead of tiers selecting fixed phase sequences, the agent gets a budget and decides how to spend it:
| Budget | Max Tool Calls | Research Queries | Enabled Tools |
|---|---|---|---|
| Polish | 8 | 0 | rewrite-section, add-entity-links, add-fact-refs, validate |
| Standard | 20 | 5 | All tools |
| Deep | 50 | 15 | All tools |
The agent plans its approach based on page state (quality score, citation count, entity link density, diagram count) and budget constraints.
Cross-Context Reuse
The same modules compose in every context:
| Context | Modules Used |
|---|---|
| Improve pipeline | All modules via agent orchestrator |
| Auto-update | research-agent + rewrite-section + citation-auditor |
| Page creation | research-agent + rewrite-section (all sections) + all enrichment |
| Citation health check | source-fetcher + claim-verifier (no agent needed) |
| Batch entity-linking | add-entity-links across many pages (no agent) |
| Manual editing assist | research-agent → cache → human edits → citation-auditor |
Implementation: Foundation Issues
The first four modules to build (GitHub issues):
- Source Fetcher (#633) — fetch URLs, extract content, cache results. Foundation for everything else.
- Section-Level Grounded Writer (#634) — rewrite one section constrained to source cache, output claim map.
- Citation Auditor (#635) — independent per-citation verification using fetched source content.
- Standalone Enrichment Tools (#636) — entity-links and fact-refs as independent, idempotent tools.
Dependency order: #633 is the foundation. #634 and #635 depend on it. #636 is independent. The orchestrator agent (Part 3's Phase 3) is built last, after the tools exist.
Part 12: Academic Foundations for Claim-First Content
Recent academic work provides strong validation for the proposition-level and claim-first approaches described in this document and the companion Claim-First Architecture proposal (now superseded by the KB system).
Proposition-Level Retrieval (Dense X Retrieval)
Chen et al. (EMNLP 2024) define a proposition as an atomic, self-contained factual expression with three properties: minimal (cannot be further split), self-contained (includes resolved coreferences), and compositional (the union of all propositions reconstructs full semantics). They built a "Propositionizer" model that decomposes text into propositions and showed that indexing Wikipedia at proposition level significantly outperformed both passage-level and sentence-level indexing for retrieval and QA tasks.1
This is direct evidence for the claim-first thesis: atomic propositions are better retrieval units than paragraphs or documents.
Decompose-Then-Verify (FActScore)
FActScore (Min et al., EMNLP 2023) formalized the decompose-then-verify pipeline: decompose generated text into atomic facts, retrieve evidence, verify each fact, compute the percentage supported. Applied to ChatGPT biographies, only 58% of atomic facts were supported by sources.2 Google's SAFE system extends this with multi-step search verification at 1/20th the cost of human annotators.
Our pipeline applies the same decomposition proactively — producing atomic claims before writing rather than extracting them afterward. The Kalshi experiment (see the Claim-First Architecture proposal, Part 9b) confirmed this catches embellishments that post-hoc verification misses.
Nanopublications: The Formal Precedent
The Semantic Web community's nanopublication framework is the academic formalization of our knowledge bundle idea. Each nanopublication has three parts: Assertion (the claim itself as RDF triples), Provenance (how the assertion came about — methods, evidence), and Publication Info (metadata). Nanopublications are immutable, cryptographically verifiable, and operate on a decentralized server network.3
The micropublication extension adds richer argumentation structure: a statement plus supporting evidence, interpretations, and challenges.4 This maps directly to our analytical claim type with its supportingClaims and reasoning fields.
Knowledge-Centric Templatic Views (SURe)
A January 2024 paper introduces Structure Unified Representation — capturing the most important knowledge from a document in a structured format, then generating multiple view types (slide decks, newsletters, reports) from that single representation with no supervision.5 This is the closest published validation of the "multiple presentation layers from one data layer" pattern that both the multi-agent architecture and claim-first architecture rely on.
GPTKB: Warning on LLM-Generated Knowledge
GPTKB (2024-2025) built a knowledge base entirely from LLM output: 105 million triples for 2.9 million entities. Their critical finding: accuracy is far from existing projects — the LLM generates many incorrect and unverifiable facts.6 This strongly validates our design decision that verification must precede storage. The claim-first architecture's insistence on per-claim verification before claims enter the store directly addresses GPTKB's accuracy problems.
Multi-Agent Verification (KARMA)
KARMA (2025) uses nine collaborative agents for entity discovery, relation extraction, schema alignment, and conflict resolution. Tested on 1,200 PubMed articles: 38,230 new entities with 83.1% verified correctness, reducing conflict edges by 18.6%.7 The modular, multi-agent design with cross-agent verification maps directly to our specialist agent architecture.
Block-Based Knowledge Tools
Production tools like Roam Research, Logseq, and Notion demonstrate that block-level (claim-level) architectures work at scale. Logseq's dual-database design (in-memory Datascript + persistent SQLite) with block-level content, typed properties, and Datalog queries is particularly instructive for the claim store's eventual database-backed implementation (Option C in the claim-first architecture).8
Argumentation Frameworks
For AI safety topics where many claims involve genuine disagreement (timelines, risk levels, alignment difficulty), claim-augmented argumentation frameworks provide formal tools for representing competing positions. These extend Dung's abstract argumentation by associating a claim to each argument, enabling re-interpretation at different evaluation stages.9 This maps to our consensus and analytical claim types and suggests the claim store should eventually support explicit argumentation structure.
Footnotes
-
Dense X Retrieval: What Retrieval Granularity Should We Use? — Dense X Retrieval: What Retrieval Granularity Should We Use? (Chen et al., EMNLP 2024). See also Factoid Wiki. ↩
-
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation — FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation (Min et al., EMNLP 2023) ↩
-
Nanopublication-based semantic publishing and reviewing — Nanopublication-based semantic publishing and reviewing (PeerJ Computer Science, 2023). See also Nanopublication Guidelines. ↩
-
Micropublications: a semantic model for claims, evidence, arguments and annotations — Micropublications: a semantic model for claims, evidence, arguments and annotations (Journal of Biomedical Semantics, 2014) ↩
-
Knowledge-Centric Templatic Views of Documents — Knowledge-Centric Templatic Views of Documents (arXiv, January 2024) ↩
-
GPTKB: Building Very Large Knowledge Bases from Language Models — GPTKB: Building Very Large Knowledge Bases from Language Models (arXiv, November 2024). See also gptkb.org. ↩
-
Citation rc-c558 (data unavailable — rebuild with wiki-server access) ↩
-
Logseq Architecture — Block-Level Database Design — Logseq Architecture — Block-Level Database Design (DeepWiki) ↩
-
Claim-augmented argumentation frameworks — Claim-augmented argumentation frameworks (Artificial Intelligence, 2023) ↩