Forecasting AI Futures: AGI Insights from Prediction Markets

blog

Substack·forecastingaifutures.substack.com/p/forecasting-agi-insig...

Credibility Rating

2/5

Mixed(2)

Mixed quality. Some useful content but inconsistent editorial standards. Claims should be verified.

Rating inherited from publication venue: Substack

A Substack analysis using prediction market data to inform AGI timeline forecasting; useful as a supplementary reference for understanding community probability estimates, though limited by lack of cached content for full verification.

Metadata

Importance: 38/100blog postanalysis

Summary

This blog post analyzes prediction market data to extract crowd-sourced forecasts about AGI timelines and development trajectories. It examines what aggregated probabilistic forecasts reveal about when transformative AI systems might arrive and the uncertainty surrounding those estimates.

Key Points

•Prediction markets offer a mechanism for aggregating diverse expert and non-expert beliefs about AGI timelines into probabilistic estimates.
•Market-based forecasts can highlight consensus views and areas of high uncertainty around transformative AI development.
•Crowd-sourced forecasting may complement or diverge from expert surveys and institutional projections on AGI timelines.
•Analysis of prediction market trends can track how community beliefs about AGI shift in response to new AI capabilities or research.
•Forecasting tools like Metaculus and Manifold Markets provide structured data points for reasoning about long-term AI development risks.

Cited by 1 page

Page	Type	Quality
Metaculus	Organization	50.0

Cached Content Preview

HTTP 200Fetched Feb 26, 2026192 KB

Forecasting AGI: Insights from Prediction Markets and Metaculus Forecasting AI Futures Subscribe Sign in Forecasting AGI: Insights from Prediction Markets and Metaculus Examining what the forecasting communities think about AGI arrival Alvin Ånestrand Feb 04, 2025 5 6 Share I have tried to find all prediction market and Metaculus questions related to AGI timelines. Here I examine how they compare to each other, and what they actually say about when AGI might arrive. If you know of a market that I have missed, please tell me in the comment section! It would also be helpful if you tell me about what questions you think are relevant but are missing from this analysis. Whenever possible, please check the more recent probability estimates in the embedded sites, instead of looking at my At The Time Of Writing (ATTOW) numbers. So, what does prediction markets and Metaculus have to say about AGI? Metaculus has this question for the arrival date of AGI: The AI system needs to be able to: Pass a really hard Turing test. Have general robotic capabilities (being able to assemble a “ circa-2021 Ferrari 312 T4 1:8 scale automobile model ” or equivalent). Achieve “at least 75% accuracy in every task and 90% mean accuracy across all tasks” on the MMLU benchmark , which measures expertise in a wide range of academic subjects. Achieve at least 90% accuracy with a single attempt for each question on the APPS benchmark , which measures coding skills. Metaculus thinks this will probably occur around the middle of 2030, though with high uncertainty. The interval between the lower and upper quartiles for the individual predictions on this question is (2026-12-28 - 2039-03-27) ATTOW. GPT-4o achieves an accuracy of 88.7% on MMLU, as seen in the leaderboard here . GPT-4 was used to get 22% accuracy on APPS . Unfortunately, most of the best models have not been tested on either MMLU or APPS. OpenAI’s o3 has been reported of achieving 71.7% on SWE-bench Verified . We can compare that to GPT-4, which managed to achieve 22.4% on SWE-bench Verified and 22% accuracy on APPS. Based on this, I think o3 would manage to achieve above 50% accuracy on APPS. The two criteria that AI currently seem furthest from fulfilling are the robotics capabilities and APPS accuracy, though current best performance on the APPS benchmark is uncertain. Coding capabilities are improving very fast, which indicated by the rapid improvements in accuracy in SWE-bench Verified , while robotics capabilities are lagging behind. If there are not too many errors in the APPS benchmark dataset, robotic skills are probably the limiting factor. But even though this is an interesting thing to forecast, the first AI system fulfilling these criteria is not necessarily a true AGI. Like the current SOTA general-purpose AIs, that are getting really good at things like answering graduate-level questions (o3 achieves 87.7% accuracy on GPQA ) and coding, the systems are not really agentic enough yet to, for example, replac

... (truncated, 192 KB total)

Resource ID: 1112345fa5280525 | Stable ID: sid_z4dyGJ8wTk