FutureSearch
FutureSearch
FutureSearch is an AI forecasting startup founded by former Metaculus leaders that combines LLM research agents with human judgment, demonstrating some prediction accuracy but facing uncertain commercial viability and limited proven impact on AI safety decisions. While the company contributes to AGI timeline discussions and has innovative hybrid forecasting approaches, its small scale and early-stage nature limit its current significance for AI risk understanding.
Quick Assessment
| Dimension | Assessment |
|---|---|
| Type | AI forecasting platform and research company |
| Founded | 2023 |
| Founders | Dan Schwarz (CEO), Lawrence Phillips (CTO) |
| Funding | $5.79M seed round (Dec 2024)1 |
| Focus Areas | AI forecasting, LLM research agents, predictive analytics |
| Notable Work | Accurate OpenAI revenue forecasts, AI 2027 report contributions, Deep Research Bench |
| Key Innovation | Hybrid human-AI forecasting outperforming pure human predictions |
Key Links
| Source | Link |
|---|---|
| Official Website | futuresearch.ai |
| Wikipedia | en.wikipedia.org |
Overview
FutureSearch is an AI forecasting startup founded in 2023 by Dan Schwarz and Lawrence Phillips, both former leaders at Metaculus, the crowdsourced prediction platform.2 The company develops intelligent data tools and LLM-powered research agents designed to enhance forecasting accuracy through hybrid human-AI approaches. Its flagship products include "everyrow," which orchestrates LLM agents on datasets, and various forecasting tools serving clients like xAI.3
The company gained recognition in 2024 when it became the first organization to demonstrate an autonomous AI forecaster that achieved net wins in bets against humans on geopolitical questions.4 This milestone illustrated the potential of AI-assisted forecasting to augment human judgment in complex prediction tasks. FutureSearch's approach emphasizes producing legible reasoning and transparent models alongside final predictions, addressing what some effective altruism community members describe as the "rationale-shaped hole at the heart of forecasting."5
Beyond commercial applications, FutureSearch has contributed to AI safety discourse through its work on AGI timeline forecasting and evaluations of frontier AI labs' safety practices. The company participated in the "AI 2027" report analysis, examining scenarios where superhuman AI capabilities could emerge by 2027 and assessing which lab policies might mitigate existential risks.6
History and Founding
FutureSearch was established in 2023 in San Bruno, California, shortly after the release of GPT-4.7 The founding team brought deep expertise from the forecasting community: Dan Schwarz previously served as CTO at Metaculus and worked as a Senior Software Engineer at Google and Waymo, where he created Google's internal prediction market.8 Lawrence Phillips, the co-founder and CTO, led the AI team at Metaculus and held machine learning positions at Cambridge Consultants, GTN, Jawbone Health, and GSK.9
The company's early team included Daniel Hnyk (Founding Software Engineer, former Director of Engineering at GWI and Metaculus), Peter Mühlbacher (Founding Research Scientist), and Christoph Sträter (Research Scientist with a PhD in Theoretical Physics from the Max Planck Institute).10 This combination of forecasting experience, AI expertise, and research capabilities positioned FutureSearch to pursue hybrid human-AI approaches to prediction.
In December 2024, FutureSearch raised $5.79 million in a seed funding round led by Metaplanet Holdings.11 The funds were allocated toward initial platform launch, refining the forecasting engine, expanding the engineering team, and implementing go-to-market strategies targeting enterprise customers. The company operates with approximately six employees as of early 2025.12
FutureSearch also received a $356,000 grant from Coefficient Giving through Good Ventures to support research on automating the generation of forecasting questions.13 This grant reflects the organization's interest in advancing forecasting methodology alongside commercial product development.
Research and Capabilities
FutureSearch has published multiple research papers and benchmarks examining AI reasoning and forecasting capabilities. Key publications include "Bench to the Future: A Pastcasting Benchmark for Forecasting Agents" (May 2025) and "Deep Research Bench: Evaluating AI Web 'Search' and 'Research' Agents" (May 2025), along with a corresponding leaderboard launched in June 2025.14 These benchmarks evaluate how AI systems perform on research tasks requiring web search, information synthesis, and predictive judgment.
The company has also produced applied forecasts demonstrating its methodology. In June 2024, FutureSearch generated forecasts for OpenAI's business metrics that proved notably accurate when OpenAI disclosed figures in September 2024.15 The platform predicted 7.7 million ChatGPT Plus subscribers and approximately $500 million in API Annual Recurring Revenue (ARR), contrasting with other industry estimates that claimed $1 billion API ARR. When OpenAI announced over 10 million subscribers later that year, FutureSearch's methodology was validated. However, the company acknowledged overestimating ChatGPT Teams adoption at 900,000 subscribers when the actual figure was closer to 100,000.16
Additional research outputs include "A Guide for LLM Assisted Web Research" (June 2025), analysis tools like "Stockfisher" for financial forecasting (June 2025), and comprehensive forecasts such as "OpenAI's Revenue in 2027."17 The company has also investigated practical challenges in AI research, including analyses of AI hallucination trends and limitations in AI research agents.
CEO Dan Schwarz has stated that hybrid human-AI forecasting approaches outperform pure human forecasting, with the company's tools enabling users to become more accurate in their predictions.18 This claim is supported by testimonials from research nonprofits using the platform.
AI Safety and AGI Timelines
FutureSearch has engaged with questions relevant to AI safety through its forecasting work on AGI timelines and frontier lab safety policies. The company contributed research to the "AI 2027" analysis examining scenarios where artificial general intelligence could emerge by 2027 through recursive self-improvement.19 In its 2026 revisitation of the AI 2027 scenario, FutureSearch predicted that achieving "Superhuman Coders"—a key milestone toward AGI—would take approximately three times longer than other AI futures scenarios suggested.20
The company's analysis highlighted Anthropic as the most safety-conscious frontier lab due to its Responsible Scaling Policy (RSP), which could reduce the probability of an "AI 2027"-like scenario by providing intervention mechanisms at high risk levels.21 This evaluation reflects FutureSearch's interest in assessing how different organizational approaches to AI development might affect existential risk trajectories.
However, the analysis also noted concerns that if labs like Anthropic prioritize internal R&D speedup alongside safety measures, it could heighten risks from rapid AGI timelines. The forecasts emphasize the importance of lab policies that genuinely constrain development at critical thresholds rather than merely documenting safety efforts.22
Technology and Products
FutureSearch's core technology combines LLM-powered research agents with structured forecasting methodologies. The "everyrow" tool enables users to run LLM agents on every row of a dataset, facilitating large-scale automated research and data analysis.23 This approach allows for systematic investigation of questions that would be prohibitively time-consuming for human researchers working alone.
The platform emphasizes legible reasoning alongside final probability estimates. This design responds to longstanding critiques in the forecasting community that valuable models and reasoning often remain hidden behind final numbers.24 By surfacing the research process and causal models underlying predictions, FutureSearch aims to provide decision-makers with actionable insights rather than opaque probability distributions.
The company serves enterprise clients across multiple domains, with demonstrated applications in financial forecasting (stock fundamentals, revenue projections), geopolitical prediction, and AI capability assessments. Customers include xAI and everyrow, though specific client relationships and use cases are not publicly detailed.25
Reception and Community Discussion
Within the effective altruism community, FutureSearch has generated discussion about the nature and value of forecasting as an intervention. Supporters appreciate the platform's emphasis on producing legible models and reasons alongside forecasts, viewing this as addressing a critical gap where "forecasters produce reasons and models that are often more valuable than the final forecasts" but these insights are typically lost.26 Some commentators position FutureSearch as contributing to a broader "rationality engine" that could improve decision-making beyond traditional prediction markets.
However, the platform has also faced framing critiques. Some EA community members argue that FutureSearch should be characterized as "in-depth research with a forecasting flavor" rather than "in-depth forecasting," given that effective altruism typically associates forecasting with Tetlock-style superforecasting and aggregation of part-time predictors.27 This framing disagreement reflects broader questions about what forecasting methodologies are most valuable for different applications.
More fundamentally, skeptics within the EA community question whether intensive forecasting represents an effective use of resources, viewing it as potentially "nerd-sniping activity"—addictive leaderboard competition rather than high-impact work.28 These critiques apply to forecasting broadly rather than to FutureSearch specifically, but they contextualize debates about the platform's value proposition.
Criticisms and Limitations
While FutureSearch's forecasting methodology has demonstrated some successes, the company has acknowledged limitations in its predictions. The significant overestimate of ChatGPT Teams adoption (900,000 predicted versus approximately 100,000 actual subscribers) illustrates challenges in forecasting rapidly evolving markets with limited public data.29
More broadly, research published by FutureSearch itself highlights challenges facing LLM-based forecasting and research tools. The company's analysis has identified issues with overconfidence in AI-generated forecasts, problems with source selection and reliability, and tendencies toward hallucination in LLM outputs.30 These documented limitations suggest that hybrid human-AI approaches, while promising, require careful design to avoid amplifying rather than mitigating human cognitive biases.
The platform's small team size (approximately six employees) and early-stage funding may constrain its ability to compete with larger organizations developing AI research tools. Companies like Anthropic and OpenAI are developing research-oriented AI systems with substantially more resources, potentially limiting FutureSearch's market position.
Key Uncertainties
Several important questions remain about FutureSearch's trajectory and impact:
Forecasting accuracy at scale: While the company demonstrated success in its June 2024 OpenAI forecasts, it remains unclear whether this level of accuracy generalizes across different domains and time horizons. Systematic evaluation of the platform's track record compared to alternative forecasting methods would clarify its value proposition.
Commercial viability: FutureSearch operates in a competitive landscape where larger AI companies are developing similar research and reasoning tools. Whether the company can sustain a profitable business model while maintaining forecasting quality remains uncertain. The relatively small seed funding round ($5.79M) suggests either capital efficiency or potential challenges in attracting larger investments.
Impact on AI safety discourse: FutureSearch's contributions to AGI timeline forecasting and frontier lab evaluations have entered EA and AI safety discussions, but the ultimate influence of these forecasts on organizational decision-making and policy is unclear. Forecasts about AI timelines face inherent difficulties given the unprecedented nature of AGI development and limited historical precedents for analogous transitions.
Methodology transparency: While FutureSearch emphasizes legible reasoning, the extent to which its commercial forecasting methodology is transparent to external evaluators remains limited. Greater documentation of how human and AI contributions combine, how sources are weighted, and how uncertainty is quantified would enable better assessment of the platform's epistemics.
Sources
Footnotes
-
FutureSearch Funding - Fundz.net — FutureSearch Funding - Fundz.net ↩
-
FutureSearch Company Overview — FutureSearch Company Overview ↩
-
FutureSearch gives odds of runaway AI in new AI futurism report - PRWeb — FutureSearch gives odds of runaway AI in new AI futurism report - PRWeb ↩
-
The rationale-shaped hole at the heart of forecasting - EA Forum — The rationale-shaped hole at the heart of forecasting - EA Forum ↩
-
AI 2027, 6 Months Later - FutureSearch — AI 2027, 6 Months Later - FutureSearch ↩
-
The Secret Sauce of Accurate AI Forecasting - Apple Podcasts — The Secret Sauce of Accurate AI Forecasting - Apple Podcasts ↩
-
FutureSearch Funding - Exa.ai Directory — FutureSearch Funding - Exa.ai Directory ↩
-
FutureSearch Offices - Exa.ai Directory — FutureSearch Offices - Exa.ai Directory ↩
-
Good Ventures - FutureSearch Forecasting Question Automation Grant — Good Ventures - FutureSearch Forecasting Question Automation Grant ↩
-
FutureSearch Research Publications — FutureSearch Research Publications ↩
-
OpenAI Case Study - FutureSearch — OpenAI Case Study - FutureSearch ↩
-
OpenAI Case Study - FutureSearch — OpenAI Case Study - FutureSearch ↩
-
FutureSearch Research Publications — FutureSearch Research Publications ↩
-
FutureSearch gives odds of runaway AI - PRWeb — FutureSearch gives odds of runaway AI - PRWeb ↩
-
AI 2027, 6 Months Later - FutureSearch — AI 2027, 6 Months Later - FutureSearch ↩
-
AI 2027, 6 Months Later - FutureSearch — AI 2027, 6 Months Later - FutureSearch ↩
-
AI 2027, 6 Months Later - FutureSearch — AI 2027, 6 Months Later - FutureSearch ↩
-
The rationale-shaped hole at the heart of forecasting - EA Forum — The rationale-shaped hole at the heart of forecasting - EA Forum ↩
-
FutureSearch Company Overview — FutureSearch Company Overview ↩
-
Citation rc-0a24 (data unavailable — rebuild with wiki-server access) ↩
-
The rationale-shaped hole at the heart of forecasting - EA Forum — The rationale-shaped hole at the heart of forecasting - EA Forum ↩
-
Is Forecasting a Promising EA Cause Area? - EA Forum — Is Forecasting a Promising EA Cause Area? - EA Forum ↩
-
OpenAI Case Study - FutureSearch — OpenAI Case Study - FutureSearch ↩
-
OpenAI Deep Research February 2025 - FutureSearch — OpenAI Deep Research February 2025 - FutureSearch ↩
References
“In this episode, Rob is joined by Dan Schwarz, founder of FutureSearch, for an in-depth conversation about the future of artificial intelligence and the pivotal role forecasting plays in understanding where AI is headed.”
unsupported: The source does not mention FutureSearch being established in San Bruno, California. unsupported: The source does not mention FutureSearch being established shortly after the release of GPT-4. inaccurate: The source only mentions Dan Schwarz as the founder of FutureSearch, not Lawrence Phillips. unsupported: The source does not mention Dan Schwarz creating Google's internal prediction market. unsupported: The source does not mention Lawrence Phillips leading the AI team at Metaculus or holding machine learning positions at Cambridge Consultants, GTN, Jawbone Health, and GSK.
2The rationale-shaped hole at the heart of forecasting - EA Forumforum.effectivealtruism.org·Blog post▸
“FutureSearch is a forecasting system with legible reasons and models at its core (examples at the end)”
“Summary: Forecasters produce reasons and models that are often more valuable than the final forecasts Most of this value is being lost due to the historical practice & incentives of forecasting, and the difficulty of crowds to “adversarially collaborate” FutureSearch is a forecasting system with legible reasons and models at its core (examples at the end)”
“Indeed. We at FutureSearch think a forecasting system with legible reasons and models at its core can contribute to such a rationality engine.”
“Overconfidence is a major issue : OAIDR often reports wrong answers confidently when it should admit uncertainty Peculiar source selection : The system frequently chooses company blogs or SEO-spam sites over authoritative sources Risk of misinformation by omission : Incomplete research that appears comprehensive is particularly dangerous”
“Dan points out that hybrid systems that combine human forecasting with machine intelligence have already surpassed human-only prognostication, but the AI-only approaches still fall short.”
The source does not mention testimonials from research nonprofits using the platform.
“FutureSearch co-authored the AI 2027 timeline forecast.”
The source does not mention the company's participation in the "AI 2027" report analysis, nor does it mention the examination of scenarios where superhuman AI capabilities could emerge by 2027 or the assessment of lab policies to mitigate existential risks.
“FutureSearch co-authored the AI 2027 timeline forecast. We predicted that Superhuman Coders would take about 3x longer than the other AI Futures forecasters predicted.”
“Most importantly: if Anthropic is in fact the one frontier lab focused heavily on internal R&D speedup, then the fact that they are also the most safety-conscious with their Responsible Scaling Policy , and likely to intervene or slow down at certain risk levels, to me significantly reduces the chance of an AI 2027 like scenario with them as the "OpenBrain".”
“Deep Research Benchmark: Evaluating LLM Web Research Agents June 11, 2025 Jack Wildman · Nikos Bosse · Daniel Hnyk · Peter Mühlbacher · Dan Schwarz · Lawrence Phillips Bench to the Future: A Pastcasting Benchmark for Forecasting Agents (paper) May 16, 2025 Nikos Bosse · Peter Mühlbacher · Lawrence Phillips · Dan Schwarz · Jack Wildman · Daniel Hnyk · Robert Gambee Deep Research Bench: Evaluating AI Web "Search" and "Research" Agents (paper) May 12, 2025”
The claim states the publication date for "Bench to the Future" as May 2025, but the source lists it as May 16, 2025. The claim states the publication date for "Deep Research Bench" as May 2025, but the source lists it as May 12, 2025. The claim states the leaderboard was launched in June 2025, but the source lists the "Deep Research Benchmark" publication as June 11, 2025, which is likely the leaderboard being referenced.
“Dan Schwarz A Guide for LLM Assisted Web Research June 25, 2025 Dan Schwarz · Tom Liptay Stockfisher: AI-Powered Value Stock Screener for Long-Term Investors June 26, 2025 Tom Liptay · Dan Schwarz OpenAI's Financial Forecast 2025-2027: Revenue, Losses & Profitability Analysis May 1, 2025”
The claim mentions "OpenAI's Revenue in 2027" as a comprehensive forecast, but the source lists "OpenAI Revenue Projections to 2027: Can It Reach $100B ARR?" and "OpenAI's Financial Forecast 2025-2027: Revenue, Losses & Profitability Analysis". The claim mentions analysis of AI hallucination trends, but the source mentions "OpenAI Deep Research: Honest Analysis and Real Limitations in 2025".
“OpenAI's financials: a Case Study of claims vs. reality ← Back to Research Key Takeaways Widely cited reporting from The Information badly overestimated how big OpenAI's API business was in early 2024 , claiming 1 B A R R a s o f M a r c h w h e n o u r f i g u r e o f 1B ARR as of March when our figure of ~ 1 B A RR a so f M a rc h w h e n o u r f i gu reo f 500M was much more accurate given later disclosures Our ChatGPT Plus estimate of 7.7M subscribers was accurate , much more so than other public estimates at the time”
“We overestimated ChatGPT Teams subscribers (900k estimate was way too high, likely ~100k actual)”
“We overestimated ChatGPT Teams subscribers (900k estimate was way too high, likely ~100k actual)”
“Prior to founding FUTURESEARCH, Dan served as CTO at Metaculus, the pre-eminent public forecasting platform. Dan was a Senior Software Engineer at Google and Waymo, and created Google's currently running internal prediction market. Prior to founding FUTURESEARCH, Lawrence led the AI team at Metaculus. Lawrence has held ML engineer and ML scientist positions at Cambridge Consultants, GTN, Jawbone Health, and GSK.”
The source does not state the year FutureSearch was established or that it was established shortly after the release of GPT-4. The source does not state that FutureSearch was established in San Bruno, California.
“Built by experts in AI and Forecasting Dan Schwarz CO-FOUNDER & CEO BS Symbolic Systems, Stanford University Prior to founding FUTURESEARCH, Dan served as CTO at Metaculus, the pre-eminent public forecasting platform. Dan was a Senior Software Engineer at Google and Waymo, and created Google's currently running internal prediction market. Lawrence Phillips CO-FOUNDER & CTO PhD Atomic/Molecular Physics, Heriot Watt University | MSc Imperial College London Prior to founding FUTURESEARCH, Lawrence led the AI team at Metaculus.”
“FutureSearch Re search Every Row Run LLM Research Agents at Scale try everyrow.io Customers of our research include xAI xAI everyrow by futu re search Orchestrate LLM search and reasoning agents by running them on every row of a dataset.”
“FutureSearch, a startup using AI to improve the accuracy of the best human forecasters, today announced that it has collaborated with the AI Futures Project by contributing core research in AI 2027.”
The source does not mention recursive self-improvement. The source mentions AI 2027, but does not mention a 2026 revisitation of the AI 2027 scenario. The source does not mention 'Superhuman Coders'.
“In 2024, FutureSearch was the first to demonstrate an autonomous AI forecaster that was a net winner of bets against real humans on important geopolitical questions.”