Longterm Wiki
Back

Forecasting AI Futures: AGI Insights from Prediction Markets

blog

Data Status

Not fetched

Cited by 1 page

PageTypeQuality
MetaculusOrganization50.0

Cached Content Preview

HTTP 200Fetched Feb 26, 2026192 KB
Forecasting AGI: Insights from Prediction Markets and Metaculus Forecasting AI Futures Subscribe Sign in Forecasting AGI: Insights from Prediction Markets and Metaculus Examining what the forecasting communities think about AGI arrival Alvin Ånestrand Feb 04, 2025 5 6 Share I have tried to find all prediction market and Metaculus questions related to AGI timelines. Here I examine how they compare to each other, and what they actually say about when AGI might arrive. If you know of a market that I have missed, please tell me in the comment section! It would also be helpful if you tell me about what questions you think are relevant but are missing from this analysis. Whenever possible, please check the more recent probability estimates in the embedded sites, instead of looking at my At The Time Of Writing (ATTOW) numbers. So, what does prediction markets and Metaculus have to say about AGI? Metaculus has this question for the arrival date of AGI: The AI system needs to be able to: Pass a really hard Turing test. Have general robotic capabilities (being able to assemble a “ circa-2021 Ferrari 312 T4 1:8 scale automobile model ” or equivalent). Achieve “at least 75% accuracy in every task and 90% mean accuracy across all tasks” on the MMLU benchmark , which measures expertise in a wide range of academic subjects. Achieve at least 90% accuracy with a single attempt for each question on the APPS benchmark , which measures coding skills. Metaculus thinks this will probably occur around the middle of 2030, though with high uncertainty. The interval between the lower and upper quartiles for the individual predictions on this question is (2026-12-28 - 2039-03-27) ATTOW. GPT-4o achieves an accuracy of 88.7% on MMLU, as seen in the leaderboard here . GPT-4 was used to get 22% accuracy on APPS . Unfortunately, most of the best models have not been tested on either MMLU or APPS. OpenAI’s o3 has been reported of achieving 71.7% on SWE-bench Verified . We can compare that to GPT-4, which managed to achieve 22.4% on SWE-bench Verified and 22% accuracy on APPS. Based on this, I think o3 would manage to achieve above 50% accuracy on APPS. The two criteria that AI currently seem furthest from fulfilling are the robotics capabilities and APPS accuracy, though current best performance on the APPS benchmark is uncertain. Coding capabilities are improving very fast, which indicated by the rapid improvements in accuracy in SWE-bench Verified , while robotics capabilities are lagging behind. If there are not too many errors in the APPS benchmark dataset, robotic skills are probably the limiting factor. But even though this is an interesting thing to forecast, the first AI system fulfilling these criteria is not necessarily a true AGI. Like the current SOTA general-purpose AIs, that are getting really good at things like answering graduate-level questions (o3 achieves 87.7% accuracy on GPQA ) and coding, the systems are not really agentic enough yet to, for example, replac

... (truncated, 192 KB total)
Resource ID: 1112345fa5280525 | Stable ID: Yjk2YjRkMj