AI-Assisted Research Workflows: Best Practices
- QualityRated 46 but structure suggests 67 (underrated by 21 points)
- Links5 links could use <R> components
Executive Summary
Section titled “Executive Summary”| Finding | Key Insight | Recommendation |
|---|---|---|
| Plan before executing | Claude tends to jump straight to writing without sufficient research | Use explicit “research → plan → execute” phases |
| Opus for strategy, Sonnet for execution | Model selection matters by phase | Spend budget on thinking (Opus), not typing (Sonnet) |
| Deep Research APIs exist | Perplexity Sonar, OpenAI Deep Research, Gemini Deep Research | Consider OpenRouter for Perplexity API access |
| Context assembly is underrated | LLMs work better with curated context than raw search | Pre-gather resources before AI reasoning |
| Multi-agent beats monolithic | Specialized agents outperform single prompts | Separate researcher, writer, validator roles |
Background
Section titled “Background”This report surveys best practices for AI-assisted research workflows in 2025-2026, drawing from:
- Anthropic’s Claude Code best practices
- Multi-agent orchestration frameworks (LangChain, CrewAI)
- Deep research API providers (Perplexity, OpenAI, Google, xAI)
- Academic work on autonomous research agents (Agent Laboratory)
The Research Pipeline Problem
Section titled “The Research Pipeline Problem”Why Single-Shot Prompts Fail
Section titled “Why Single-Shot Prompts Fail”A typical approach:
"Write a comprehensive article about compute governance"This fails because:
- No context gathering - AI uses only training data, misses recent developments
- No strategic planning - AI doesn’t think about what actually matters
- Premature writing - Starts generating prose before understanding the topic
- No validation - Errors compound without feedback loops
The Better Architecture
Section titled “The Better Architecture”Context Assembly → Strategic Planning → Targeted Research → Drafting → Validation → Grading (local) (Opus) (Perplexity) (Sonnet) (scripts) (Haiku)Phase 1: Context Assembly
Section titled “Phase 1: Context Assembly”Goal: Gather everything relevant before invoking expensive AI reasoning.
What to Gather
Section titled “What to Gather”| Source | Method | Cost |
|---|---|---|
| Related wiki pages | Glob/Grep for topic mentions | Free |
| Existing resources | Query resources database | Free |
| Entity relationships | Check backlinks, cross-refs | Free |
| Summarize context | Haiku to compress | $0.01-0.05 |
Why This Matters
Section titled “Why This Matters”Implementation Pattern
Section titled “Implementation Pattern”// 1. Find related pages (free)const relatedPages = await searchWiki(topic);
// 2. Find existing resources (free)const resources = await queryResourcesDB(topic);
// 3. Summarize with Haiku ($0.02)const contextBundle = await summarizeContext({ model: 'haiku', pages: relatedPages, resources: resources});Phase 2: Strategic Planning (Opus)
Section titled “Phase 2: Strategic Planning (Opus)”Goal: Figure out what this article should actually cover and why.
What Opus Should Decide
Section titled “What Opus Should Decide”| Question | Why It Matters |
|---|---|
| What are the key cruxes/debates? | Structures the entire article |
| What’s the right framing? | Determines reader takeaway |
| What’s already well-covered elsewhere? | Avoids duplication |
| What specific questions need external research? | Directs Phase 3 |
| What’s the relationship to existing pages? | Enables cross-linking |
Prompt Pattern
Section titled “Prompt Pattern”Given this context bundle about [TOPIC]:
[CONTEXT_BUNDLE]
You are planning a wiki article. Before any writing, think through:
1. **Cruxes**: What are the 2-3 key debates or uncertainties about this topic?2. **Framing**: What's the most useful frame for readers? (risk? opportunity? tradeoff?)3. **Gap analysis**: What does existing coverage miss?4. **Research questions**: What specific questions need external research?5. **Structure**: What sections would best serve readers?
Do NOT write the article. Output a structured plan in JSON.Budget
Section titled “Budget”| Complexity | Est. Input | Est. Output | Cost |
|---|---|---|---|
| Simple topic | 20K tokens | 2K tokens | $0.50-1.00 |
| Complex topic | 50K tokens | 5K tokens | $2.00-4.00 |
Phase 3: Targeted Research
Section titled “Phase 3: Targeted Research”Goal: Fill specific gaps identified in the plan—not open-ended browsing.
Option A: Claude Code WebSearch
Section titled “Option A: Claude Code WebSearch”Uses Claude’s built-in web search. Good integration but limited depth.
// Directed by the planfor (const question of plan.researchQuestions) { await webSearch(question);}Cost: Included in Claude API pricing
Depth: Moderate (single search per query)
Option B: Perplexity Sonar via OpenRouter
Section titled “Option B: Perplexity Sonar via OpenRouter”Perplexity Sonar Deep Research is purpose-built for comprehensive research. Available via OpenRouter API.
| Model | Use Case | Pricing |
|---|---|---|
| sonar | Quick lookups | $1/1M tokens |
| sonar-pro | Deeper search | $3/1M tokens + $5/1K searches |
| sonar-deep-research | Comprehensive reports | $3/1M tokens + $5/1K searches |
Integration Example:
import OpenAI from 'openai';
const openrouter = new OpenAI({ baseURL: 'https://openrouter.ai/api/v1', apiKey: process.env.OPENROUTER_API_KEY,});
const research = await openrouter.chat.completions.create({ model: 'perplexity/sonar-deep-research', messages: [{ role: 'user', content: `Research: ${plan.researchQuestions.join('\n')}` }],});Option C: Other Deep Research APIs
Section titled “Option C: Other Deep Research APIs”| Provider | API Available? | Notes |
|---|---|---|
| Perplexity | ✅ via OpenRouter | Best for research depth |
| OpenAI Deep Research | ⚠️ Limited | Azure AI Foundry only |
| Gemini Deep Research | ❌ | No API (consumer only) |
| Grok DeepSearch | ⚠️ Limited | xAI API, X integration |
Option D: Open Source
Section titled “Option D: Open Source”Open Deep Research (HuggingFace) provides an open-source implementation with 10K+ GitHub stars.
Phase 4: Drafting (Sonnet)
Section titled “Phase 4: Drafting (Sonnet)”Goal: Execute the plan with research in hand.
Prompt Pattern
Section titled “Prompt Pattern”You are writing a wiki article based on this plan:
[OPUS_PLAN]
Using these research findings:
[CURATED_RESEARCH]
Following this style guide:
[STYLE_GUIDE_EXCERPT]
Write the article. Use tables over bullet lists. Include citations.Escape all dollar signs (\\$100 not $100).Why Sonnet, Not Opus?
Section titled “Why Sonnet, Not Opus?”Drafting is execution, not strategy. Sonnet:
- Follows instructions well
- Costs 1/10th of Opus
- Produces similar prose quality when given a good plan
Cost: $0.50-1.50 per article
Phase 5: Validation
Section titled “Phase 5: Validation”Goal: Catch errors before they compound.
Automated Checks (Free)
Section titled “Automated Checks (Free)”npm run crux -- validate compile # Syntax errorsnpm run crux -- validate unified --rules=dollar-signs,comparison-operatorsnpm run crux -- validate entity-links # Broken linksFix Loop (Haiku)
Section titled “Fix Loop (Haiku)”If validation fails, use Haiku to fix mechanical issues:
if (validationErrors.length > 0) { await fixWithHaiku(draft, validationErrors); // Re-validate}Cost: $0.02-0.10 per fix cycle
Phase 6: Grading
Section titled “Phase 6: Grading”Goal: Ensure quality meets threshold before accepting.
Use existing grading infrastructure:
node scripts/content/grade-by-template.mjs --page new-articleQuality Gates
Section titled “Quality Gates”| Grade | Action |
|---|---|
| Q4-Q5 (80+) | Accept |
| Q3 (60-79) | Targeted improvements |
| Q1-Q2 (below 60) | Significant rework or reject |
Cost Comparison
Section titled “Cost Comparison”Old Approach: Single-Shot Opus
Section titled “Old Approach: Single-Shot Opus”| Component | Cost |
|---|---|
| Opus writes entire article | $3-5 |
| Often needs rework | +$2-3 |
| Total | $5-8 |
| Quality | Inconsistent |
New Pipeline Approach
Section titled “New Pipeline Approach”| Phase | Model | Cost |
|---|---|---|
| Context assembly | Haiku | $0.05 |
| Strategic planning | Opus | $1.50-3.00 |
| Deep research | Perplexity | $0.50-1.00 |
| Drafting | Sonnet | $0.50-1.00 |
| Validation | Local | $0.00 |
| Fixes | Haiku | $0.05-0.10 |
| Grading | Haiku | $0.05 |
| Total | $2.65-5.20 | |
| Quality | More consistent |
Multi-Agent Architectures
Section titled “Multi-Agent Architectures”For complex articles, consider specialized agents:
Agent Laboratory Pattern
Section titled “Agent Laboratory Pattern”Agent Laboratory (arXiv 2025) achieves 84% cost reduction using three stages:
- Literature review agent - Gathers sources
- Experimentation agent - Tests claims
- Report writing agent - Produces output
CrewAI Pattern
Section titled “CrewAI Pattern”from crewai import Agent, Task, Crew
researcher = Agent(role='Researcher', goal='Find authoritative sources')analyst = Agent(role='Analyst', goal='Identify key insights')writer = Agent(role='Writer', goal='Produce clear prose')
crew = Crew(agents=[researcher, analyst, writer], tasks=[...])Master-Planner-Executor-Writer
Section titled “Master-Planner-Executor-Writer”Multi-agent search architecture:
- Master: Coordinates overall workflow
- Planner: Decomposes tasks into DAG
- Executor: Runs tool calls
- Writer: Synthesizes into prose
Implementation Recommendations
Section titled “Implementation Recommendations”For LongtermWiki Wiki
Section titled “For LongtermWiki Wiki”| Priority | Recommendation |
|---|---|
| High | Convert page-improver to SDK (done ✅) |
| High | Add context assembly phase before Opus |
| Medium | Integrate Perplexity via OpenRouter for deep research |
| Medium | Create page-creator with full pipeline |
| Low | Explore multi-agent CrewAI for complex topics |
Environment Setup
Section titled “Environment Setup”# .env additionsANTHROPIC_API_KEY=sk-ant-... # For Claude Code SDKOPENROUTER_API_KEY=sk-or-... # For Perplexity accessBudget Guidelines
Section titled “Budget Guidelines”| Article Type | Budget | Model Mix |
|---|---|---|
| Simple stub expansion | $1-2 | Haiku + Sonnet |
| Standard knowledge-base page | $3-5 | Opus planning + Sonnet execution |
| Complex research report | $5-10 | Opus + Perplexity + Sonnet |
| Flagship article | $10-15 | Opus + Deep Research + Opus review |
Products & APIs Landscape
Section titled “Products & APIs Landscape”This section catalogs tools available for AI-assisted research workflows as of early 2026.
Web Search / Grounding APIs
Section titled “Web Search / Grounding APIs”These provide real-time web search for grounding LLM responses with citations.
| Product | Pricing | Key Features | Best For |
|---|---|---|---|
| Perplexity Sonar | $1-5/1K queries | Deep research mode, multi-step reasoning | Comprehensive research |
| Exa AI | $5/1K queries + free tier | Semantic search, embeddings-based, research agents | AI-native search |
| Tavily | $0.008/credit, 1K free/mo | SOC 2 certified, LangChain native, MCP support | Production RAG pipelines |
| You.com API | Tiered plans | 93% SimpleQA score, MCP server, Deep Search | High-accuracy grounding |
| Brave Search API | $4/1K + $5/1M tokens | 94.1% F1 SimpleQA, AI Grounding mode | Privacy-focused, MCP |
| OpenRouter :online | $4/1K results | Works with any model, Exa-powered | Model flexibility |
Academic Literature Search
Section titled “Academic Literature Search”Specialized tools for searching and analyzing scientific papers.
| Product | Pricing | Database | API? | Best For |
|---|---|---|---|---|
| Elicit | Freemium + paid plans | Semantic Scholar (200M papers) | ✅ | Systematic reviews, data extraction |
| Consensus | Freemium | Semantic Scholar | Limited | Evidence synthesis, yes/no questions |
| Undermind | $16/mo | Semantic Scholar, PubMed, arXiv | ❌ | Deep niche literature discovery |
| Semantic Scholar API | Free | 200M+ papers | ✅ | Building custom research tools |
| ResearchRabbit | Free | Cross-database | ❌ | Citation mapping, discovery |
Specialized Corpus Search
Section titled “Specialized Corpus Search”| Product | Pricing | Corpus | API? | Best For |
|---|---|---|---|---|
| Scry | Free / $9/mo | 72M docs: arXiv, LessWrong, EA Forum, X, Wikipedia | ✅ SQL+vector | AI safety research, reproducible queries |
Scry Key Features:
- SQL + vector search with arbitrary query composition
- Semantic operations: mixing concepts, debiasing (“X but not Y”), contrastive axes
- Curated sources: arXiv, bioRxiv, PhilPapers, LessWrong, EA Forum, HN, Twitter/X, Bluesky
- Reproducibility: visible SQL queries, structured metadata, iterative refinement
- Custom embeddings: store named concept vectors for reuse
Scry vs ChatGPT Deep Research: Scry emphasizes control and reproducibility (you write SQL, see exactly what matched), while Deep Research is opaque but broader. Scry is better for iterative exploration of a fixed corpus; Deep Research for one-shot web synthesis.
Scientific Research Agents
Section titled “Scientific Research Agents”Full agentic systems for autonomous research.
| Product | Access | Focus | Key Capability |
|---|---|---|---|
| FutureHouse Falcon | Web + API | Scientific literature | Deep synthesis across thousands of papers |
| FutureHouse Crow | Web + API | Quick scientific Q&A | Fast factual answers with citations |
| OpenAI Deep Research | ChatGPT Pro/Plus | General research | Multi-step web research, o3-powered |
| Gemini Deep Research | Consumer only | General + Google ecosystem | Gmail, Drive, Docs integration |
| Grok DeepSearch | xAI API | General + X/Twitter | Real-time social + web, very fast |
Web Scraping / Content Extraction
Section titled “Web Scraping / Content Extraction”For when you need to extract content from specific URLs.
| Product | Pricing | Key Features |
|---|---|---|
| Firecrawl | $16-719/mo | LLM-ready markdown, 67% token reduction |
| Jina Reader | Free tier available | URL to markdown, simple API |
| Apify | Usage-based | Web scraping platform, many actors |
MCP Servers for Claude Code
Section titled “MCP Servers for Claude Code”Model Context Protocol servers enable direct integration with Claude Code.
| MCP Server | Function | Source |
|---|---|---|
| Brave Search | Web search grounding | Official |
| Exa | Semantic web search | Official |
| Tavily | Search + extract | Official |
| You.com | Web search | Official |
| Perplexity | Deep research | Community |
| Firecrawl | URL scraping | Official |
Cost Comparison Matrix
Section titled “Cost Comparison Matrix”Estimated costs for a typical research task (finding 20 sources with summaries):
| Approach | Est. Cost | Quality | Speed |
|---|---|---|---|
| Manual Google + reading | $0 (your time) | High | Slow |
| Perplexity Sonar Deep Research | $0.50-1.00 | High | Fast |
| Exa Research Pro | $0.50-1.50 | High | Fast |
| Claude WebSearch (5 searches) | ≈$0.10 | Medium | Fast |
| Elicit (with extraction) | $0.50-2.00 | High (academic) | Medium |
| FutureHouse Falcon | Unknown | Very High (scientific) | Medium |
| OpenAI Deep Research | Included in $20/mo | High | Slow (minutes) |
Open Questions
Section titled “Open Questions”| Question | Why It Matters | Current State |
|---|---|---|
| Perplexity vs Exa vs Tavily? | Affects research phase design | Need empirical comparison |
| Optimal context window size? | Too much noise, too little misses info | ≈30-50K tokens seems good |
| Human-in-loop checkpoints? | Quality vs automation tradeoff | After planning phase? |
| Caching research results? | Reuse across similar articles | Not implemented yet |
| MCP vs direct API? | Integration complexity vs flexibility | MCP simpler but less control |
| FutureHouse for AI safety? | Scientific focus may miss grey literature | Worth testing |
Sources
Section titled “Sources”Best Practices
Section titled “Best Practices”- Claude Code: Best practices for agentic coding — Anthropic Engineering
- Multi-Step LLM Chains: Best Practices — Deepchecks
- 20 Agentic AI Workflow Patterns — Skywork AI
Deep Research APIs
Section titled “Deep Research APIs”- Perplexity Sonar Deep Research — OpenRouter
- OpenRouter Web Search — Real-time web grounding for any model
- Introducing Deep Research — OpenAI
- Deep Research AI Tools Comparison — Bright Inventions
Search APIs
Section titled “Search APIs”- Exa AI — Semantic search API for AI applications
- Tavily — Web API for AI agents, $25M raise
- You.com APIs — 93% SimpleQA, Deep Search API
- Brave AI Grounding — 94.1% F1 score on SimpleQA
- Complete Guide to Web Search APIs 2025 — Firecrawl
Academic Research Tools
Section titled “Academic Research Tools”- Elicit — AI research assistant, 200M papers
- Semantic Scholar — Free academic search API
- Scry — SQL+vector search over 72M docs (LessWrong, EA Forum, arXiv, etc.)
- 8 Best AI Research Assistant Tools — Documind
- Best AI Research Tools for Literature Review — Medium
Scientific Research Agents
Section titled “Scientific Research Agents”- FutureHouse Platform — Superintelligent AI agents for science
- FutureHouse customer story — Claude
- Agent Laboratory: Using LLM Agents as Research Assistants — arXiv 2025
Research Frameworks
Section titled “Research Frameworks”- Autonomous Agents papers — GitHub (updated daily)
- LLM Agents Explained: Complete Guide — Dynamiq
- Top 5 MCP Search Tools 2025 — Oreate AI
Model Comparisons
Section titled “Model Comparisons”- AI Deep Research: Claude vs ChatGPT vs Grok — AIMultiple
- Agentic LLMs in 2025 — Data Science Dojo
- Deep Research Survey — HuggingFace