AI for Human Reasoning Fellowship
- QualityRated 55 but structure suggests 87 (underrated by 32 points)
- Links1 link could use <R> components
Quick Assessment
Section titled “Quick Assessment”| Dimension | Assessment | Evidence |
|---|---|---|
| Program Type | Fellowship + Incubator Hybrid | Combines research exploration with prototype building |
| Scale | 30 fellows | Inaugural cohort, July-October 2025 |
| Funding | $25K-$50K per fellow | Based on experience level |
| Outputs | 25+ projects | Most with working demos, papers, or deployed tools |
| Focus Area | Epistemic tools & coordination | AI for sensemaking, forecasting, negotiation, fact-checking |
| Location | SF Bay Area | Shared office space; some remote participation |
| Notable Achievement | First AI-approved Community Note | Nathan Young’s bot wrote world’s first approved AI community note on X |
Program Details
Section titled “Program Details”| Attribute | Details |
|---|---|
| Full Name | AI for Human Reasoning Fellowship |
| Organizer | Future of Life Foundation (FLF) |
| Duration | 12 weeks (July 14 - October 3, 2025) |
| Stipend | $25,000 (junior) to $50,000 (senior) |
| Fellows | 30 participants |
| Location | SF Bay Area shared office |
| Program Director | Ben Goldhaber |
| Program Managers | Timothy Telleen Lawton, Kathleen Finlinson |
| Website | aiforhumanreasoning.com |
| Organizer Website | flf.org |
| Est. Program Cost | $1-2M total (stipends + operations) |
Overview
Section titled “Overview”The AI for Human Reasoning Fellowship was an inaugural program run by the Future of Life Foundation (FLF) that brought together researchers, builders, and entrepreneurs to develop AI tools designed to enhance human reasoning and coordination.1 The program operated as a hybrid between a research fellowship and a startup incubator, with fellows receiving substantial stipends ($25K-$50K) to explore and prototype beneficial AI applications.
The fellowship addressed what FLF describes as a critical gap: the world is “radically underinvested” in AI applications that could enhance human decision-making and coordination capabilities.2 While much attention focuses on AI risks, relatively little goes toward building tools that could help humanity navigate complex challenges—including those posed by AI itself.
The program structure consisted of three phases:
- Explore phase: Research, discussion, and ideation on potential projects
- Build phase: Creating prototypes and real-world implementations
- Translation phase: Polishing work, reflection, and public presentation at demo day
Core Focus Areas
Section titled “Core Focus Areas”The fellowship targeted six key application areas for AI-augmented human reasoning:
| Area | Description | Example Projects |
|---|---|---|
| Epistemic Tools | Fact-checking, rhetoric detection, information verification | Community Notes AI, Evidentry, Epistemic Evals |
| Forecasting & Scenarios | Prediction marketsInterventionPrediction MarketsPrediction markets achieve Brier scores of 0.16-0.24 (15-25% better than polls) by aggregating dispersed information through financial incentives, with platforms handling $1-3B annually. For AI saf...Quality: 56/100, strategic foresight, scenario planning | Deliberation Markets, Deep Future, Sentinel |
| Negotiation | AI-mediated high-stakes bargaining | Negotiation Station |
| Decision Support | Reasoning scaffolds, bias navigation | Confidence Interval, Chord |
| Evaluations | Benchmarking epistemic virtue and AI trustworthiness | DeliberationBench, Society Library evals |
| Coordination | Consensus-finding, collective sensemaking | Polis 2.0, Pivotal, Updraft |
Notable Projects
Section titled “Notable Projects”Epistemic & Fact-Checking Tools
Section titled “Epistemic & Fact-Checking Tools”Community Notes AI (Nathan Young, Robert Gordon) Built an AI system that writes Community Notes for X (formerly Twitter). Achieved a significant milestone: the world’s first AI-written Community Note to be approved through X’s rating system. The bot’s notes have been viewed over 2.5 million times.3
Open Note Network (Steve Isley) An AI system generating Community Notes paired with a dedicated website hosting long-form fact checks, linking short-form social media corrections to comprehensive analysis.
AI for the Epistemic Commons (Herbie Bradley) Research on “Community Notes everywhere” for browsers and AI-written Wikipedia improvements. Built evaluations measuring model capability to fix errors and expand Wikipedia articles.
Evidentry (Agita Pasaribu) Coalition infrastructure connecting survivors, platforms, and regulators to verify and remove AI-generated intimate imagery. Features multi-detector aggregation and verification workflows that reduce removal time from days to minutes.
Forecasting & Strategic Foresight
Section titled “Forecasting & Strategic Foresight”Deliberation Markets (Siddarth Srinivasan) A novel prediction market mechanism where participants write explanations supervised by LLMs. Instead of buying YES/NO contracts, users provide reasoning that LLMs synthesize into probability estimates.
Deep Future (Gordon Brander) Strategic foresight tool powered by scenario methods from US military and RAND. AI agents map strategic landscapes, identify driving forces, explore trajectories, and discover leverage points.
SentinelSentinelSentinel is a 2024-founded foresight organization led by Nuño Sempere that processes millions of news items weekly through AI filtering and elite forecaster assessment to identify global catastroph...Quality: 39/100 (Nuno Sempere) Systems detecting and tracking global risks including the “Eye of Sauron” monitoring system, xrisk.fyi tool, automated Twitter reports, and forecasting infrastructure using HDBSCAN clustering for risk analysis.
Collective Decision-Making
Section titled “Collective Decision-Making”Polis 2.0 (Colin Megill, Maximilian Kroner Dale) Real-time system gathering and analyzing what large groups think using advanced statistics and ML. Conducted a survey with 1,000+ quota-sampled Americans who voted 90,000+ times on 1,000+ statements about AI concerns.
Chord (Alex Bleakley) AI-orchestrated communication tool that parallelizes conversations to help groups make better, faster decisions. Led to founding of Sylience post-fellowship.
Pivotal (Anand Shah, Parker Whitfill, Kai Sandbrink, Ben Sklaroff) Multi-agent orchestration tool helping teams coordinate on scheduling, action items, and organizational context. Integrates with existing software and automates workflows.
Updraft (Robert Gordon) Real-time facilitation tool where groups and AI collaboratively map, cluster, and evolve ideas on a shared 2D canvas. Part of a connected experiment suite including Winnow and Prune.
Evaluations & Benchmarks
Section titled “Evaluations & Benchmarks”DeliberationBench (Maximilian Kroner Dale, Luke Hewitt, Paul de Font-Reaulx) Novel AI persuasiveness benchmark using Deliberative Polls as a normative reference. Demonstrated findings in a 4,000-person randomized LLM persuasiveness experiment.
Epistemic Evals (Alejandro Botas) Evaluated human and model outputs for epistemic quality attributes. Used LLMs to assess EA Forum posts on Reasoning/Clarity/Value dimensions and tested model sensitivity to epistemically irrelevant contexts.
Society Library (Jamie Joyce) Built a semi-automated information processing pipeline producing a 600+ page intelligence report on complex government events with multiple viewpoints. Researching how structured datasets can serve as benchmarks for LLM truth-seeking.
Other Notable Projects
Section titled “Other Notable Projects”| Project | Fellow(s) | Description | Video |
|---|---|---|---|
| Collective Agency | Blake Borgeson | Framework for increasing humanity’s collective agency through AI intermediaries. Presented motifs for group collaboration (facilitators, orchestrators, AI intermediaries) and advocates for “shovel-ready wisdom”—ideas for better collective processes ready to become technology. | Watch |
| Future Visions Hub | Sofia Vanhanen | Group decision-making software and epistemic infrastructureInterventionEpistemic InfrastructureComprehensive analysis of epistemic infrastructure showing AI fact-checking achieves 85-87% accuracy at $0.10-$1.00 per claim versus $50-200 for human verification, while Community Notes reduces mi...Quality: 59/100 for collective sensemaking about desirable futures | Watch |
| Confidence Interval | Vaughn Tan | Self-service webapp using LLMs as Socratic mirror for making subjective arguments rigorous. Currently used by college students with interest from startups and governments. | Watch |
| Negotiation Station | Kai Sandbrink | AI tools for high-stakes negotiations between nations/corporations as trusted mediators | Watch |
| AI Policy Simulation | Alexander van Grootel, Emma Kumleben | AI aiding institutional decision-making through strategic foresight and forecasting for AGI transition navigation | Watch |
| Virtuous | Paul de Font-Reaulx | Epistemic evaluations for frontier models; developing DeliberationBench | Watch |
| RiskWatch | Alyssia Jovellanos, Martin Ciesielski-Listwan | Risk Threat Observatory enabled by prediction markets | Watch |
| Worker-Owned Startups | Ben Sklaroff | Governance models for worker-owned startup structures | Watch |
| Agent Strategy Arena | Joshua Levy | Platform for scalable, grounded evaluations of AI agents’ prediction accuracy | Watch |
| AI Discourse Sensemaking | Matthew Brooks, Emma Kumleben, Niki Dupuis | Using LLMs to map opinion landscapes and detect polarization; built semi-automated argument mapper | Watch |
Fellows
Section titled “Fellows”The 30 fellows came from diverse backgrounds including academia, entrepreneurship, policy, and technology:
| Fellow | Background/Affiliation |
|---|---|
| Blake Borgeson | Collaboration AI researcher |
| Colin Megill | Polis creator |
| Nathan Young | ManifoldManifoldManifold is a play-money prediction market with millions of predictions and ~2,000 peak daily users, showing AGI by 2030 at ~60% vs Metaculus ~45%. Platform scored Brier 0.0342 on 2024 election (vs...Quality: 43/100 community, forecasting |
| Robert Gordon | Goodheart Labs |
| Herbie Bradley | AI researcher |
| Nuno Sempere | SamotsvetySamotsvetyElite forecasting group Samotsvety dominated INFER competitions 2020-2022 with relative Brier scores twice as good as competitors, providing influential probabilistic forecasts including 28% TAI by...Quality: 61/100, QURIOrganizationQURI (Quantified Uncertainty Research Institute)QURI develops Squiggle (probabilistic programming language with native distribution types), SquiggleAI (Claude-powered model generation producing 100-500 line models), Metaforecast (aggregating 2,1...Quality: 48/100, SentinelSentinelSentinel is a 2024-founded foresight organization led by Nuño Sempere that processes millions of news items weekly through AI filtering and elite forecaster assessment to identify global catastroph...Quality: 39/100 founder |
| Kai Sandbrink | Multi-project contributor |
| Gordon Brander | Strategic foresight |
| Jamie Joyce | Society Library |
| Sofia Vanhanen | Future Visions Hub |
| Agita Pasaribu | Evidentry |
| And 19 others | Various backgrounds |
Advisors
Section titled “Advisors”The fellowship included advisors with expertise in AI safety, mechanism design, and coordination:
- Anthony Aguirre - President of FLF, Executive Director of Future of Life InstituteFliComprehensive profile of FLI documenting $25M+ in grants distributed (2015: $7M to 37 projects, 2021: $25M program), major public campaigns (Asilomar Principles with 5,700+ signatories, 2023 Pause ...Quality: 46/100
- Andreas Stuhlmüller - Founder of ElicitElicitElicit is an AI research assistant with 2M+ users that searches 138M papers and automates literature reviews, founded by AI alignment researchers from Ought and funded by Open Philanthropy ($31M to...Quality: 63/100
- Brendan Fong - Category theory and applied mathematics
- Additional advisors from academia and industry
Relationship to FLI and FLF
Section titled “Relationship to FLI and FLF”The Future of Life Foundation (FLF) is a separate organization from the Future of Life Institute (FLI)FliComprehensive profile of FLI documenting $25M+ in grants distributed (2015: $7M to 37 projects, 2021: $25M program), major public campaigns (Asilomar Principles with 5,700+ signatories, 2023 Pause ...Quality: 46/100, though both share leadership (Anthony Aguirre serves as Executive Director of FLI and President of FLF). While FLI focuses primarily on existential risk advocacy and grantmaking, FLF operates more as an incubator for beneficial AI applications.
Outcomes and Impact
Section titled “Outcomes and Impact”The fellowship produced several concrete outcomes:
| Outcome Type | Count/Details |
|---|---|
| Projects launched | 25+ with demos or papers |
| Open-source tools | Multiple GitHub repositories |
| Academic papers | DeliberationBench, Deliberation Markets |
| Companies founded | Sylience (from Chord project) |
| Real-world deployment | AI Community Notes with 2M+ views |
| Research artifacts | Evaluations, benchmarks, datasets |
FLF indicated willingness to provide funding beyond the fellowship period or assist fellows in launching new organizations based on their work.2
Demo Day Presentations
Section titled “Demo Day Presentations”All fellows presented their work at a demo day. Video presentations with auto-generated transcripts are available on YouTube.
| Project | Presenter(s) | Video |
|---|---|---|
| Collective Agency | Blake Borgeson | Watch |
| Pivotal | Anand Shah, Parker Whitfill, Kai Sandbrink, Ben Sklaroff | Watch |
| Polis 2.0 | Colin Megill, Maximilian Kroner Dale | Watch |
| Deliberation Markets | Siddarth Srinivasan | Watch |
| Community Notes AI | Nathan Young, Robert Gordon | Watch |
| Open Note Network | Steve Isley | Watch |
| AI for Epistemic Commons | Herbie Bradley | Watch |
| Evidentry | Agita Pasaribu | Watch |
| Worker-Owned Startups | Ben Sklaroff | Watch |
| Society Library | Jamie Joyce | Watch |
| AI Discourse Sensemaking | Matthew Brooks, Emma Kumleben, Niki Dupuis | Watch |
| Confidence Interval | Vaughn Tan | Watch |
| Virtuous | Paul de Font-Reaulx | Watch |
| Epistemic Evals | Alejandro Botas | Watch |
| DeliberationBench | Max Kroner Dale, Luke Hewitt, Paul de Font-Reaulx | Watch |
| Forecasting & Provenance | Alyssia Jovellanos, Martin Ciesielski-Listwan | Watch |
| Agent Strategy Arena | Joshua Levy | Watch |
| AI Policy Simulation | Alexander van Grootel, Emma Kumleben | Watch |
| Deep Future | Gordon Brander | Watch |
| Future Visions Hub | Sofia Vanhanen | Watch |
| Tools for Sensemaking | Matthew Brooks | Watch |
| Negotiation Station | Kai Sandbrink | Watch |
| Chord | Alex Bleakley | Watch |
Presentation Highlights
Section titled “Presentation Highlights”Key insights from the demo day presentations (based on transcript analysis):
- Polis 2.0: Surveyed 1,000+ Americans who voted 90,000+ times on AI concerns. Found bridging consensus on deep fakes and privacy, with partisan split on regulation approach. New features include semantic topic clustering and LLM-generated consensus summaries.
- Deliberation Markets: Live demo showed market probability shifting 32%→65% from a single well-reasoned argument—demonstrating how explanation quality drives predictions. Core innovation: LLMs evaluate reasoning quality and trade on synthesized probabilities.
- Community Notes AI: Early notes reduce misleading tweet shares by 25-50%; late notes have almost no effect. Team estimates $150K-$500K needed to scale to TikTok, Chrome extensions, and Perplexity.
- Deep Future: Compresses week-long RAND-style scenario planning workshops into 10-15 minutes. Demo identified 38 driving forces for “How will AI agents transform the web by 2030?” and generated strategic reports with opportunities, threats, and early warning signals.
- Collective Agency: Blake Borgeson presented motifs for AI-human collaboration patterns—facilitators, orchestrators, and AI intermediaries—across collaboration phases (understand, explore, decide, coordinate, create, share, reflect).
Assessment
Section titled “Assessment”Strengths
Section titled “Strengths”| Strength | Evidence |
|---|---|
| Novel focus area | Few programs specifically target AI for epistemics/coordination |
| Concrete outputs | Most projects have working demos, not just research |
| Diverse approaches | Covered forecasting, fact-checking, negotiation, collective decision-making |
| Real deployment | Some tools already in use (Community Notes AI) |
| Open source | Many projects released code publicly |
Limitations
Section titled “Limitations”| Limitation | Notes |
|---|---|
| First cohort | No track record yet for long-term impact |
| Prototype stage | Most projects still early; unclear which will scale |
| Narrow ecosystem | Fellows largely from EA/rationalist adjacent networks |
| Evaluation difficulty | Hard to measure impact of “reasoning improvement” tools |
Related
Section titled “Related”| Item | Relationship |
|---|---|
| Future of Life InstituteFliComprehensive profile of FLI documenting $25M+ in grants distributed (2015: $7M to 37 projects, 2021: $25M program), major public campaigns (Asilomar Principles with 5,700+ signatories, 2023 Pause ...Quality: 46/100 | Related organization; shared leadership with FLF |
| QURIOrganizationQURI (Quantified Uncertainty Research Institute)QURI develops Squiggle (probabilistic programming language with native distribution types), SquiggleAI (Claude-powered model generation producing 100-500 line models), Metaforecast (aggregating 2,1...Quality: 48/100 | Similar focus on epistemic tools; Nuno Sempere connection |
| Manifold MarketsManifoldManifold is a play-money prediction market with millions of predictions and ~2,000 peak daily users, showing AGI by 2030 at ~60% vs Metaculus ~45%. Platform scored Brier 0.0342 on 2024 election (vs...Quality: 43/100 | Prediction market platform; Nathan Young connection |
| ElicitElicitElicit is an AI research assistant with 2M+ users that searches 138M papers and automates literature reviews, founded by AI alignment researchers from Ought and funded by Open Philanthropy ($31M to...Quality: 63/100 | AI research tool; Andreas Stuhlmüller as advisor |
| AI Safety Training ProgramsTraining ProgramsComprehensive guide to AI safety training programs including MATS (78% alumni in alignment work, 100+ scholars annually), Anthropic Fellows ($2,100/week stipend, 40%+ hired full-time), LASR Labs (5...Quality: 70/100 | Complementary fellowship in different focus area |
Sources
Section titled “Sources”Footnotes
Section titled “Footnotes”-
FLF Fellowship on AI for Human Reasoning: $25-50k, 12 weeks, EA Forum ↩ ↩2
-
World’s First AI Community Note, Nathan Young’s Substack ↩