Skip to content

Research-First Page Creation Pipeline

📋Page Status
Page Type:ContentStyle Guide →Standard knowledge base article
Quality:49 (Adequate)
Importance:2 (Peripheral)
Words:1.1k
Structure:
📊 7📈 0🔗 0📚 024%Score: 7/15
LLM Summary:Internal experiment documenting a multi-phase LLM pipeline that enforces citation discipline, reducing table rows by 97% while increasing citation density from 0.9 to 2.3 per 100 words. Standard tier ($10.50) achieved same quality (78/100) as premium ($15), with research-first approach producing 42 citations per article across three test topics.
FindingResultImplication
Research-first worksAll 3 test articles found Wikipedia, primary sources, AND critical perspectivesFront-loading research prevents hallucination
Citation discipline enforced42 inline citations per article (vs ≈40 poorly-sourced in original)“Only use facts.json” rule eliminates unsourced claims
Tables dramatically reduced196 → 5 table rows (97% reduction)Prose-first prompting produces readable content
Standard tier is optimal$10.50 achieved same quality (78) as $15 PremiumReview→gap-fill→polish cycle is worth the cost; extra rewrite isn’t
Budget tier has known gapsVerify phase identifies issues but can’t fix themGood for drafts, not final articles

The standard approach of prompting an LLM to “write an article about X” fails because:

  1. Writing before researching - LLM generates plausible content without verified sources
  2. No citation requirement - Facts appear without URLs
  3. Tables as a crutch - LLMs over-produce tables because they’re easy to generate
  4. No verification - Errors and gaps persist to final output

We built scripts/content/page-creator-v2.mjs with this structure:

Research → Extract → Synthesize → Review → Gap-fill → Polish

Key innovation: The synthesis phase receives only extracted facts with citations, not raw sources. The prompt explicitly says “If a fact isn’t in facts.json, DO NOT include it.”


PhaseModelBudgetPurpose
researchSonnet$1-4Gather 10-16 sources via WebSearch/WebFetch
extractSonnet$1.50Pull facts into structured JSON with citations
synthesizeOpus/Sonnet$1.50-2.50Write article from facts.json ONLY
reviewOpus$1.50-2Identify gaps, bias, missing perspectives
gap-fillSonnet$1.50Research topics identified as missing
polishOpus$1.50Integrate new facts, improve prose
budget: research-liteextractsynthesize-sonnetverify (~$4.50)
standard: researchextractsynthesize-opusreviewgap-fillpolish (~$10.50)
premium: research-deepextractsynthesize-opuscritical-reviewgap-fillrewritepolish (~$15)

The extract phase outputs structured JSON:

{
"facts": [
{
"claim": "LessWrong was founded in February 2009",
"sourceUrl": "https://en.wikipedia.org/wiki/LessWrong",
"sourceTitle": "Wikipedia",
"confidence": "high"
}
],
"controversies": [...],
"statistics": [...],
"gaps": ["Topics we have no facts for"]
}

The synthesize prompt then says:

“Every factual claim MUST have an inline citation. If a fact isn’t in facts.json, DO NOT include it.”


TopicTierWhy Chosen
MIRIBudgetWell-documented nonprofit, good for testing minimal pipeline
LessWrongStandardExisting page to compare against (quality 43)
AnthropicPremiumHigh-profile, controversial, tests deep research
  • Total cost and time
  • Citation count
  • Table row count
  • Word count
  • Whether controversies section included
  • Self-assessed quality score

All three pipelines completed successfully:

TopicTierTimeCostPhases
MIRIBudget10m$4.504/4
LessWrongStandard16m$10.506/6
AnthropicPremium24m$15.007/7
MetricMIRI (Budget)LessWrong (Standard)Anthropic (Premium)
Final Quality75*7878
Word Count≈2,7002,4802,850
Citations≈354242
Tables≈311
Has ControversiesYesYesYes

*Budget tier’s verify phase identified gaps but couldn’t fix them.

AspectOriginalNew (Standard)
Table rows1965
URLs/Citations4146
Citation density0.9/100 words2.3/100 words
Critical sources cited04
ControversiesSuperficial tableFull section with quotes

All three pipelines found diverse source types:

LessWrong sources found:

  • Wikipedia article
  • Official LessWrong posts (history, surveys, FAQ)
  • EA Forum discussions
  • Critical perspectives: Bryan Caplan (Econlib), Tyler Cowen, Greg Epstein (NYT), RationalWiki

Anthropic sources found:

  • Wikipedia, official company page
  • Financial data (valuations, revenue)
  • Critical: SaferAI critique, White House feud coverage, deceptive AI behavior reports
  • Policy positions on SB 1047, export controls

The Anthropic critical-review phase identified:

“Quick Assessment table is overwhelmingly favorable” “Company culture section reads like PR” “Several interpretive statements presented as fact without sources” “Missing: lobbying positions, concrete safety failures, competitor comparisons”

The gap-fill phase then researched exactly those topics and the rewrite integrated them.

TierCostQualityNotes
Budget$4.5075Gaps identified but not fixed
Standard$10.5078Gaps fixed
Premium$15.0078Same quality, more thorough

The extra $4.50 from Standard to Premium didn’t improve the quality score. The review→gap-fill→polish cycle is where the value is.

Explicit instructions matter:

  • “Maximum 4 tables”
  • “Minimum 60% prose”
  • “Tables are for genuinely comparative data, not lists”

Result: 97% reduction in table rows.


  1. Use Standard tier ($10.50) for most pages - Best quality/cost ratio
  2. Use Budget tier ($4.50) for drafts - Good starting point for human editing
  3. Reserve Premium ($15) for controversial topics - Extra scrutiny is valuable for Anthropic, OpenAI, etc.
  1. Add to package.json for easy access:

    "scripts": {
    "create-page": "node scripts/content/page-creator-v2.mjs"
    }
  2. Consider batch mode - Run multiple Standard-tier pages overnight

  3. Integrate with grading - Auto-grade output and re-run if below threshold

  1. Perplexity Deep Research integration - Could improve research phase
  2. Human-in-the-loop review - Show review.json before gap-fill for approval
  3. Incremental updates - Re-run pipeline on existing pages to improve them

scripts/content/page-creator-v2.mjs # The pipeline script
.claude/temp/page-creator/
├── miri/
│ ├── sources.json # 10 sources
│ ├── facts.json # 18 facts, 20 stats, 8 controversies
│ ├── draft.mdx # Final output (budget has no polish)
│ └── review.json # Identified but unfixed gaps
├── lesswrong/
│ ├── sources.json # Research results
│ ├── facts.json # Extracted claims
│ ├── draft.mdx # Initial synthesis
│ ├── review.json # Gap analysis
│ ├── additional-facts.json # Gap-fill results
│ ├── final.mdx # Polished output
│ └── summary.json # Quality metrics
└── anthropic/
└── [same structure as lesswrong]

Terminal window
# Standard tier (recommended)
node scripts/content/page-creator-v2.mjs "Topic Name" --tier standard
# Budget tier (for drafts)
node scripts/content/page-creator-v2.mjs "Topic Name" --tier budget
# Premium tier (for controversial topics)
node scripts/content/page-creator-v2.mjs "Topic Name" --tier premium
# Copy output to specific location
node scripts/content/page-creator-v2.mjs "Topic Name" --tier standard --output ./my-article.mdx

The research-first pipeline successfully addresses the core problem of AI-generated content: unsourced, table-heavy data dumps. By structuring the process as Research → Extract → Synthesize with explicit citation requirements, we produce articles that are:

  • Well-sourced (42 citations with URLs)
  • Readable (90% prose, not tables)
  • Balanced (includes critical perspectives)
  • Cost-effective ($10.50 for production quality)

The Standard tier is recommended for most use cases. The key insight is that research quality matters more than generation quality - you can’t synthesize what you haven’t found.