Forecasting Research Institute
- QualityRated 55 but structure suggests 93 (underrated by 38 points)
- Links10 links could use <R> components
Quick Assessment
Section titled “Quick Assessment”| Dimension | Assessment | Evidence |
|---|---|---|
| Research Quality | Exceptional | Peer-reviewed publications in International Journal of Forecasting, ICLR |
| Methodology Innovation | High | XPT persuasion tournament methodology, ForecastBench benchmark |
| Influence | Growing | Cited in policy discussions, academic forecasting research |
| Leadership | World-class | Philip Tetlock (Chief Scientist), author of Superforecasting |
| Scale | Moderate | 169 participants in XPT, growing ForecastBench community |
| AI Relevance | Central | AI progress forecasting is major research focus |
| Key Finding | Striking | Superforecasters severely underestimated AI progress |
Organization Details
Section titled “Organization Details”| Attribute | Details |
|---|---|
| Full Name | Forecasting Research Institute |
| Founded | 2021 |
| Chief Scientist | Philip Tetlock (author of Superforecasting and Expert Political Judgment) |
| CEO | Josh Rosenberg |
| Research Director | Ezra Karger (also Senior Economist, Federal Reserve Bank of Chicago) |
| Location | Philadelphia area / Remote |
| Status | 501(c)(3) research nonprofit |
| Website | forecastingresearch.org |
| Funding | Over $16M from Coefficient Giving (2021-present) |
| Key Outputs | XPT Tournament, ForecastBench (ICLR 2025), FRI-ONN Nuclear Study |
| Focus | Forecasting methodology for high-stakes decisions and existential risk |
Overview
Section titled “Overview”The Forecasting Research Institute develops advanced forecasting methods to improve decision-making on high-stakes issues, with particular emphasis on existential risks and AI development. Founded in 2021 with initial support from Coefficient Giving and led by Chief Scientist Philip Tetlock—whose research established the field of superforecasting—FRI represents the next generation of forecasting research, moving from establishing accuracy standards to channeling forecasting into real-world policy relevance.
FRI’s flagship project, the Existential Risk Persuasion Tournament (XPT), introduced a multi-stage methodology designed to improve the rigor of debates about catastrophic risks. Unlike traditional forecasting tournaments that simply aggregate independent predictions, the XPT required participants to engage in structured debates, explain their reasoning, and update their forecasts through adversarial collaboration. Running from June through October 2022, the tournament brought together 169 participants who made forecasts about existential threats including AI, biosecurity, climate change, and nuclear war. The results, published in the International Journal of Forecasting in 2025, produced striking findings about the limits of current forecasting on AI progress.
The institute has documented a significant gap between superforecaster and domain expert predictions on AI, with superforecasters systematically underestimating the pace of AI progress. On questions about AI achieving gold-level performance on the International Mathematical Olympiad, superforecasters gave only 2.3% probability to outcomes that actually occurred in July 2025, compared to 8.6% from domain experts. Across four AI benchmarks (MATH, MMLU, QuALITY, and IMO Gold), superforecasters assigned an average probability of just 9.7% to outcomes that actually occurred, compared to 24.6% from domain experts. This finding has important implications for how AI timeline forecasts should be interpreted and weighted.
FRI’s more recent work includes ForecastBench, a dynamic benchmark for evaluating LLM forecasting capabilities published at ICLR 2025, and a collaboration with the Open Nuclear Network on nuclear catastrophe risk forecasting presented at the 2024 NPT PrepCom in Geneva.
Philip Tetlock: Research Background
Section titled “Philip Tetlock: Research Background”Philip Tetlock’s foundational research provides the intellectual basis for FRI’s work. Understanding his four decades of forecasting research is essential to understanding FRI’s approach.
Expert Political Judgment (1984-2005)
Section titled “Expert Political Judgment (1984-2005)”Tetlock’s landmark study, summarized in Expert Political Judgment: How Good Is It? How Can We Know? (Princeton University Press, 2005), examined 28,000 forecasts from 284 experts across government, academia, and journalism over two decades. The sobering findings established core principles that still guide FRI’s methodology:
| Finding | Implication |
|---|---|
| Experts were often only slightly more accurate than chance | Traditional expertise is insufficient for forecasting |
| Simple extrapolation algorithms often beat expert forecasts | Formal methods can outperform intuition |
| Media-prominent forecasters performed worse than low-profile colleagues | Fame and accuracy are inversely correlated |
| ”Foxes” (eclectic thinkers) outperformed “hedgehogs” (single-theory adherents) | Cognitive style matters more than credentials |
The “hedgehog vs. fox” framework, adapted from Isaiah Berlin’s essay, became a cornerstone of forecasting research. Hedgehogs “know one big thing”—they have a grand theory (Marxist, Libertarian, or otherwise) that they extend into many domains with great confidence. Foxes “know many little things”—they draw from eclectic traditions and improvise in response to changing events. Tetlock found that foxes demonstrated significantly better calibration and discrimination scores, particularly on long-term forecasts.
The Good Judgment Project (2011-2015)
Section titled “The Good Judgment Project (2011-2015)”Building on these findings, Tetlock co-led the Good Judgment Project (GJP), a multi-year IARPA-funded study of probability judgment accuracy. The project tested whether forecasting accuracy could be systematically improved through selection, training, and team structure.
| Component | Approach | Result |
|---|---|---|
| Participant Pool | Thousands of volunteer forecasters | Enabled large-scale experimentation |
| Training | Simple probability training exercises | Improved Brier scores significantly |
| Selection | Personality-trait tests for cognitive bias | Identified consistent top performers |
| Superforecasters | Top 2% across multiple seasons | Maintained accuracy over time and topics |
| Team Structure | Collaborative forecasting groups | Teams outperformed individuals |
Key findings included:
- Training exercises substantially improved forecast accuracy as measured by Brier scores
- The best forecasters (“superforecasters”) maintained consistent performance across years and question categories
- A log-odds extremizing aggregation algorithm outperformed competitors
- GJP forecasts were reportedly 30% better than intelligence officers with access to classified information
The project resulted in Superforecasting: The Art and Science of Prediction (2015), co-authored with Dan Gardner, which distilled principles of good forecasting: gather evidence from diverse sources, think probabilistically, work in teams, keep score, and remain willing to admit error.
From GJP to FRI
Section titled “From GJP to FRI”FRI represents the third phase of Tetlock’s research program. While the first phase established that experts are poorly calibrated and the second identified characteristics of accurate forecasters, FRI’s mission focuses on applying these insights to high-stakes policy questions—particularly existential risks where feedback loops are weak or nonexistent.
Key XPT Findings
Section titled “Key XPT Findings”For detailed XPT methodology, participant breakdown, and full analysis, see the dedicated XPT (Existential Risk Persuasion Tournament)ConceptXPT (Existential Risk Persuasion Tournament)A 2022 forecasting tournament with 169 participants found superforecasters severely underestimated AI progress (2.3% probability for IMO gold vs actual 2025 achievement) and gave 8x lower AI extinc...Quality: 54/100 page.
AI Progress Forecasting Accuracy
Section titled “AI Progress Forecasting Accuracy”A 2025 follow-up analysis by Tetlock, Rosenberg, Kučinskas, Ceppas de Castro, Jacobs, and Karger evaluated how well XPT participants predicted three years of AI progress since summer 2022:
| Benchmark | Superforecasters | Domain Experts | Actual Outcome |
|---|---|---|---|
| IMO Gold by 2025 | 2.3% | 8.6% | Achieved July 2025 |
| MATH benchmark | 9.3% | 21.4% | Exceeded |
| MMLU benchmark | 7.2% | 25.0% | Exceeded |
| QuALITY benchmark | 20.1% | 43.5% | Exceeded |
| Average across benchmarks | 9.7% | 24.6% | All exceeded predictions |
Both groups systematically underestimated AI progress, but domain experts were closer to reality. Superforecasters initially thought an AI would achieve IMO Gold in 2035—a decade late. The only strategy that reliably worked was aggregating everyone’s forecasts: taking the median of all predictions produced substantially more accurate forecasts than any individual or group.
Existential Risk Estimates
Section titled “Existential Risk Estimates”| Risk Category | Superforecasters (Median) | Domain Experts (Median) | Ratio |
|---|---|---|---|
| Any catastrophe by 2100 | 9% | 20% | 2.2x |
| Any extinction by 2100 | 1% | 6% | 6x |
| AI-caused extinction by 2100 | 0.38% | 3% | 7.9x |
| Nuclear extinction by 2100 | 0.1% | 0.3% | 3x |
| Bio extinction by 2100 | 0.08% | 1% | 12.5x |
The ~8x gap between superforecasters (0.38%) and domain experts (3%) on AI-caused extinction represents one of the largest disagreements in the tournament. Notably, superforecasters gave higher probability to nuclear catastrophe (4%) than AI catastrophe (2.13%) by 2100, but assigned extinction risk from AI as roughly an order of magnitude larger than from nuclear weapons—possibly because AI could “deliberately hunt down survivors.”
For comparison, existential risk researcher Toby Ord estimated a 16% total chance of extinction by 2100—16x higher than superforecasters and 2.5x higher than domain experts.
Conditional vs. Unconditional Risk
Section titled “Conditional vs. Unconditional Risk”The XPT revealed how conditional framing affects risk estimates:
| Framing | Superforecaster Estimate |
|---|---|
| Unconditional AI extinction by 2100 | 0.38% |
| Conditional on AGI by 2070 | 1% |
| Increase factor | 2.6x |
Minimal Belief Updating
Section titled “Minimal Belief Updating”A striking finding was the minimal convergence of beliefs despite four months of structured debate with monetary incentives:
“Despite incentives to share their best arguments during four months of discussion, neither side materially moved the other’s views.”
The paper suggests this would be puzzling if participants were Bayesian agents but is less puzzling if participants were “boundedly rational agents searching for confirmatory evidence as the risks of embarrassing accuracy feedback receded.” Strong AI-risk proponents made particularly extreme long- but not short-range forecasts.
ForecastBench
Section titled “ForecastBench”ForecastBench is FRI’s dynamic, contamination-free benchmark for evaluating large language model forecasting capabilities, published at ICLR 2025.
ForecastBench Design
Section titled “ForecastBench Design”The benchmark was designed to solve the data contamination problem that plagues static AI benchmarks:
| Feature | Description |
|---|---|
| Dynamic Questions | 1,000 questions, continuously updated with new future-dated questions |
| Contamination-Free | All questions about events with no known answer at submission time |
| Multiple Baselines | Compares LLMs to superforecasters, public forecasters, and random chance |
| Open Submission | Public leaderboard for model comparison |
| Question Sources | Market questions (Manifold, Metaculus, Polymarket, RAND) and dataset questions (ACLED, DBnomics, FRED, Wikipedia, Yahoo Finance) |
| Funding | Supported by Coefficient GivingCoefficient GivingCoefficient Giving (formerly Open Philanthropy) has directed $4B+ in grants since 2014, including $336M to AI safety (~60% of external funding). The organization spent ~$50M on AI safety in 2024, w...Quality: 55/100 until mid-2027 |
The authors (Ezra Karger, Houtan Bastani, Chen Yueh-Han, Zachary Jacobs, Danny Halawi, Fred Zhang, and Philip E. Tetlock) designed ForecastBench as a “valuable proxy for general intelligence” since forecasting requires integrating diverse knowledge sources and reasoning under uncertainty.
ForecastBench Results
Section titled “ForecastBench Results”| Forecaster | Difficulty-Adjusted Brier Score | Notes |
|---|---|---|
| Superforecasters | 0.081 | Best overall performance |
| GPT-4.5 (Feb 2025) | 0.101 | Best LLM performance |
| GPT-4 (Mar 2023) | 0.131 | Baseline frontier model |
| Public Participants | ≈0.12 | LLMs now outperform non-experts |
| Random Baseline | 0.25 | Chance performance |
Key findings from ForecastBench:
| Finding | Evidence |
|---|---|
| Superforecasters still lead | The 0.054 Brier score gap between superforecasters and GPT-4o is larger than the 0.026 gap between GPT-4o and GPT-4 |
| Rapid LLM improvement | State-of-the-art LLM performance improves by ≈0.016 difficulty-adjusted Brier points annually |
| Projected parity | Linear extrapolation suggests LLMs will match superforecaster performance in November 2026 (95% CI: December 2025 – January 2028) |
| Initial models underperformed | Claude-3.5 Sonnet and GPT-4 Turbo initially performed roughly as well as a simple median of public forecasts |
ForecastBench provides important empirical grounding for claims about AI forecasting capabilities, demonstrating measurable progress while showing that complex geopolitical and scientific questions remain challenging for LLMs.
Key Publications
Section titled “Key Publications”| Publication | Year | Venue | Key Contribution |
|---|---|---|---|
| Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament | 2023 | Working paper | XPT methodology and initial findings |
| Subjective-probability forecasts of existential risk | 2025 | International Journal of Forecasting | Peer-reviewed XPT results (Vol. 41, Issue 2, pp. 499-516) |
| ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities | 2025 | ICLR | LLM forecasting benchmark |
| Improving Judgments of Existential Risk | 2022 | SSRN Working Paper | Framework for better forecasts, questions, explanations, and policies |
| Can Humanity Achieve a Century of Nuclear Peace? | 2024 | FRI-ONN Report | Nuclear catastrophe probability estimates |
| Assessing Near-Term Accuracy in the XPT | 2025 | FRI Report | Retrospective accuracy analysis of 2022 forecasts |
Low-Probability Forecasting Challenges
Section titled “Low-Probability Forecasting Challenges”FRI’s research addresses a critical methodological challenge: forecasting low-probability, high-consequence events like existential risks where traditional calibration feedback is unavailable.
Key Challenges
Section titled “Key Challenges”| Challenge | Issue | Manifestation in XPT |
|---|---|---|
| Base Rate Anchoring | Forecasters anchor too heavily on historical rates | May explain superforecaster underestimation of novel AI progress |
| Probability Compression | All “unlikely” events collapsed to similar estimates | Extinction estimates cluster near 0-1% despite very different underlying mechanisms |
| Feedback Delays | Can’t learn from rare events | No extinction has occurred to calibrate against |
| Horizon Effects | Extreme estimates for distant futures | Strong AI-risk proponents gave extreme long- but not short-range forecasts |
| Confirmatory Search | Seeking evidence that confirms existing views | Neither side updated materially despite structured debate |
FRI Methodological Responses
Section titled “FRI Methodological Responses”| Method | Description | Application |
|---|---|---|
| Structured Scenario Analysis | Break down complex events into component paths | Decompose “AI extinction” into specific mechanisms |
| Adversarial Collaboration | Pair forecasters with opposing views | XPT Stage 3 debate structure |
| Cross-domain Calibration | Use accuracy on resolvable questions to weight long-run forecasts | Compare 2025-resolvable vs 2100 forecasts |
| Reciprocal Scoring | Methods for forecasting questions that may never resolve | Karger (2021) methodology paper |
FRI-ONN Nuclear Risk Research
Section titled “FRI-ONN Nuclear Risk Research”FRI collaborated with the Open Nuclear Network (ONN) in association with the University of Pennsylvania on a comprehensive nuclear catastrophe forecasting study.
Study Design
Section titled “Study Design”| Aspect | Details |
|---|---|
| Partner Organizations | FRI, Open Nuclear Network, University of Pennsylvania |
| Methodology | XPT-style structured elicitation with superforecasters and nuclear experts |
| Definition of Catastrophe | Event causing over 10 million deaths |
| Time Horizon | Probability estimates through 2045 |
| Presentation | 2024 NPT PrepCom in Geneva (July 25, 2024) |
| Publication | Can Humanity Achieve a Century of Nuclear Peace? |
Key Findings
Section titled “Key Findings”| Finding | Estimate |
|---|---|
| Median expert probability of nuclear catastrophe by 2045 | 5% |
| Superforecaster probability | 1% |
| Most likely geopolitical source | Russia-NATO/USA tensions |
| Potential risk reduction | 50% if six key policies fully implemented |
Recommended Policy Interventions
Section titled “Recommended Policy Interventions”The study identified six policies that could collectively reduce nuclear catastrophe risk by 50%:
- Establishing a secure crisis communications network
- Conducting comprehensive failsafe reviews of nuclear protocols
- Implementing enhanced early warning cooperation
- Adopting no-first-use declarations
- Reducing nuclear arsenal sizes
- Strengthening non-proliferation verification
Geneva Side Event Takeaways
Section titled “Geneva Side Event Takeaways”The July 2024 side event “A Gamble of Our Own Choosing: Forecasting Nuclear Risks” highlighted:
- Forecasting combined with qualitative analysis is invaluable for understanding nuclear risks
- Need for more dynamic risk assessment methods
- Importance of communicating findings effectively to decision-makers
- Focus on near-term events enhances methodology credibility
Funding History
Section titled “Funding History”FRI’s work is primarily funded by Coefficient GivingCoefficient GivingCoefficient Giving (formerly Open Philanthropy) has directed $4B+ in grants since 2014, including $336M to AI safety (~60% of external funding). The organization spent ~$50M on AI safety in 2024, w...Quality: 55/100, which launched forecasting as an independent cause area in 2024.
Coefficient Giving Grants to FRI
Section titled “Coefficient Giving Grants to FRI”| Grant | Amount | Purpose | Date |
|---|---|---|---|
| Initial Planning Support | $175,000 | Planning work by Tetlock | Oct 2021 |
| Science of Forecasting | $1.3M (over 3 years) | Core research program, forecasting platform development | 2022 |
| General Support | $10M (over 3 years) | Expanded research program | 2023 |
| AI Progress Forecasting Panel | $1.07M (over 2 years) | Panel of AI experts forecasting capabilities, adoption, impacts | 2024 |
| Red-line Evaluations | $125,000 | Operationalizing AI red-line evaluations | 2024 |
| Tripwire Capability Evaluations | $158,850 | AI capability tripwire forecasting | 2024 |
| Forecasting Benchmark | $100,000 | Collaboration with Steinhardt lab on ForecastBench | 2024 |
| XPT Recognition Prize | $15,000 | Recognition for XPT publication | 2023 |
| Analysis of Historical Forecasts | $10,000 | Forecasting accuracy analysis | 2024 |
| AI Risk Discussion Project | $150,000 | Bringing together forecasters who disagree on AI x-risk | 2024 |
Total Coefficient Giving funding: Over $16 million
Comparison with Other Organizations
Section titled “Comparison with Other Organizations”| Organization | Primary Method | Strength | Limitation |
|---|---|---|---|
| FRI | Methodology research, structured tournaments | Scientific rigor, peer-reviewed publications | Smaller scale, research-focused |
| MetaculusOrganizationMetaculusMetaculus is a reputation-based forecasting platform with 1M+ predictions showing AGI probability at 25% by 2027 and 50% by 2031 (down from 50 years away in 2020). Analysis finds good short-term ca...Quality: 50/100 | Prediction aggregation platform | Scale, continuous questions, public access | Less methodological innovation |
| Epoch AIOrganizationEpoch AIEpoch AI provides empirical AI progress tracking showing training compute growing 4.4x annually (2010-2024), 300 trillion tokens of high-quality training data with exhaustion projected 2026-2032, a...Quality: 91/100 | Empirical AI trends analysis | Data quality, quantitative rigor | Less forecasting focus |
| Good Judgment Inc. | Commercial superforecaster panels | Proven accuracy, operational focus | Commercial rather than research mission |
| Polymarket | Prediction markets | Real-money incentives, liquidity | Regulatory constraints, short-term focus |
Implications for AI Safety
Section titled “Implications for AI Safety”What the XPT Results Mean
Section titled “What the XPT Results Mean”The XPT findings have significant implications for how the AI safety community should interpret forecasts:
| Implication | Evidence | Action |
|---|---|---|
| Superforecasters may systematically underestimate AI progress | 2.5x gap on benchmark predictions; thought IMO Gold would occur in 2035 | Weight superforecaster AI timeline estimates with skepticism |
| Domain experts may be better calibrated on AI specifically | Closer to actual outcomes on MATH, MMLU, QuALITY, IMO | Give more weight to AI researcher predictions on AI questions |
| Aggregation outperforms individuals | Combined median was most accurate forecast | Use wisdom-of-crowds rather than individual expert opinions |
| Structured debate has limited impact | Minimal belief updating despite four months of discussion | Don’t expect debates to resolve fundamental disagreements |
| Long-range forecasts are particularly unreliable | Extreme positions taken on 2100 but not 2025 questions | Focus policy on near-term measurable outcomes |
The Calibration Paradox
Section titled “The Calibration Paradox”FRI’s research reveals a paradox: superforecasters are selected specifically for their calibration on historical questions, yet they significantly underperformed on AI progress. This suggests that:
- Base-rate reasoning fails for unprecedented change: Superforecasters may anchor on historical rates of technological progress that don’t account for potential AI acceleration
- Domain expertise matters for novel domains: On questions requiring deep understanding of AI capabilities, specialists outperformed generalists
- Neither group is reliable for extinction risk: With no feedback available, even the best forecasters may be poorly calibrated
Recommendations from FRI Research
Section titled “Recommendations from FRI Research”| Recommendation | Rationale |
|---|---|
| Weight domain expertise higher on AI | Experts outperformed superforecasters on AI questions |
| Use structured elicitation | Reduces some biases vs. simple aggregation |
| Decompose complex questions | Helps calibrate low-probability estimates |
| Track calibration by domain | Forecaster accuracy varies across topics |
| Invest in resolvable benchmarks | Near-term forecasts provide calibration feedback |
| Combine multiple forecaster types | Aggregation across groups worked best |
Team and Leadership
Section titled “Team and Leadership”Core Leadership
Section titled “Core Leadership”| Role | Person | Background |
|---|---|---|
| Chief Scientist | Philip Tetlock | Annenberg University Professor at UPenn, author of Superforecasting and Expert Political Judgment, Good Judgment Project co-founder, elected to American Philosophical Society (2019) |
| CEO | Josh Rosenberg | Organizational leadership and operations |
| Research Director | Ezra Karger | Senior Economist at Federal Reserve Bank of Chicago, research in labor economics, public economics, and forecasting |
Research Team
Section titled “Research Team”According to FRI’s team page, the organization includes:
| Team Member | Focus Area |
|---|---|
| Michael Page | Research operations |
| Tegan McCaslin | Research |
| Zachary Jacobs | Research, ForecastBench development |
| + Various contractors | External collaborators in forecasting |
Academic Affiliations
Section titled “Academic Affiliations”| Institution | Affiliation |
|---|---|
| University of Pennsylvania | Tetlock’s primary appointment (Wharton School + School of Arts and Sciences) |
| Federal Reserve Bank of Chicago | Karger’s primary appointment |
| NBER | Karger is NBER affiliate |
Timeline
Section titled “Timeline”| Date | Event |
|---|---|
| 1984-2003 | Tetlock conducts Expert Political Judgment study (284 experts, 28,000 forecasts) |
| 2005 | Expert Political Judgment published by Princeton University Press |
| 2011 | Good Judgment Project launched with IARPA funding |
| 2015 | Superforecasting published; GJP concludes after beating competition |
| October 2021 | FRI founded with $175K Coefficient Giving planning grant |
| June-October 2022 | XPT tournament conducted (169 participants, 4 months) |
| 2022 | Coefficient Giving provides $1.3M multi-year grant |
| August 2023 | XPT working paper released |
| 2023 | Coefficient Giving provides $10M general support grant |
| September 2024 | ForecastBench launched |
| July 2024 | FRI-ONN nuclear risk study presented at NPT PrepCom in Geneva |
| October 2024 | Nuclear risk report published |
| January 2025 | ForecastBench paper published at ICLR 2025 |
| 2025 | XPT results published in International Journal of Forecasting |
| September 2025 | XPT near-term accuracy follow-up published |
Strengths and Limitations
Section titled “Strengths and Limitations”Strengths
Section titled “Strengths”| Strength | Evidence |
|---|---|
| Methodological rigor | Peer-reviewed publications in top venues (ICLR, Int. Journal of Forecasting) |
| Leadership credentials | Tetlock’s four decades of forecasting research, American Philosophical Society membership |
| Innovation | XPT methodology, ForecastBench, structured elicitation techniques |
| Policy relevance | Nuclear risk work presented at NPT PrepCom, AI policy applications |
| Independence | Research nonprofit with philanthropic rather than commercial funding |
| Quantitative findings | Specific probability estimates with documented methodology |
Limitations
Section titled “Limitations”| Limitation | Context |
|---|---|
| Scale | 169 XPT participants vs. thousands on platforms like Metaculus |
| Speed | Research focus means slower output than real-time forecasting platforms |
| Cost | Intensive methodology requires significant resources per study |
| Generalizability | Tournament findings may not transfer to all forecasting contexts |
| Long-range uncertainty | No ground truth available for existential risk calibration |
| Minimal updating | XPT showed debates had limited impact on beliefs |
Open Questions
Section titled “Open Questions”| Question | Relevance |
|---|---|
| Should policy weight superforecasters or domain experts on AI? | XPT suggests experts may be better calibrated for AI specifically |
| Can LLMs eventually match superforecasters? | ForecastBench suggests parity by late 2026 |
| How should we interpret minimal belief updating? | May reflect genuine irreducible uncertainty or cognitive limitations |
| What forecasting methods work for unprecedented events? | Neither group was well-calibrated on AI progress |
External Links
Section titled “External Links”Sources
Section titled “Sources”- Forecasting Research Institute - Official website
- Philip Tetlock | Penn Psychology - Academic profile
- Ezra Karger - Research Director profile
- Subjective-probability forecasts of existential risk (Int. Journal of Forecasting) - Peer-reviewed XPT results
- ForecastBench (ICLR 2025) - LLM forecasting benchmark paper
- The Extinction Tournament (Astral Codex Ten) - Scott Alexander analysis
- What did forecasters get right and wrong? (FRI Substack) - 2025 accuracy retrospective
- Ezra Karger on existential risk forecasting (80,000 Hours) - Podcast interview
- Coefficient Giving Forecasting Grants - Funding history
- FRI-ONN Nuclear Risk Project - Nuclear forecasting collaboration
- Can Humanity Achieve a Century of Nuclear Peace? - Nuclear risk report
- Expert Political Judgment (Princeton) - Tetlock’s 2005 book
- The Good Judgment Project (Wikipedia) - GJP background
- Philip E. Tetlock (Wikipedia) - Biography
- Announcing FRI (EA Forum) - Original announcement
- XPT Forecasts on AI Risk (EA Forum) - Community analysis