FutureSearch

📋Page Status

Page Type:ContentStyle Guide →Standard knowledge base article

Quality:50 (Adequate)⚠️

Importance:45 (Reference)

Last edited:2026-02-01 (5 days ago)

Words:1.9k

Structure:

📊 2📈 0🔗 9📚 32•0%Score: 12/15

LLM Summary:FutureSearch is an AI forecasting startup founded by former Metaculus leaders that combines LLM research agents with human judgment, demonstrating some prediction accuracy but facing uncertain commercial viability and limited proven impact on AI safety decisions. While the company contributes to AGI timeline discussions and has innovative hybrid forecasting approaches, its small scale and early-stage nature limit its current significance for AI risk understanding.

Issues (1):

QualityRated 50 but structure suggests 80 (underrated by 30 points)

Quick Assessment

Dimension	Assessment
Type	AI forecasting platform and research company
Founded	2023
Founders	Dan Schwarz (CEO), Lawrence Phillips (CTO)
Funding	$5.79M seed round (Dec 2024)¹
Focus Areas	AI forecasting, LLM research agents, predictive analytics
Notable Work	Accurate OpenAI revenue forecasts, AI 2027 report contributions, Deep Research Bench
Key Innovation	Hybrid human-AI forecasting outperforming pure human predictions

Key Links

Source	Link
Official Website	futuresearch.ai
Wikipedia	en.wikipedia.org

Overview

FutureSearch is an AI forecasting startup founded in 2023 by Dan Schwarz and Lawrence Phillips, both former leaders at Metaculus, the crowdsourced prediction platform.² The company develops intelligent data tools and LLM-powered research agents designed to enhance forecasting accuracy through hybrid human-AI approaches. Its flagship products include “everyrow,” which orchestrates LLM agents on datasets, and various forecasting tools serving clients like xAI.³

The company gained recognition in 2024 when it became the first organization to demonstrate an autonomous AI forecaster that achieved net wins in bets against humans on geopolitical questions.⁴ This milestone illustrated the potential of AI-assisted forecasting to augment human judgment in complex prediction tasks. FutureSearch’s approach emphasizes producing legible reasoning and transparent models alongside final predictions, addressing what some effective altruism community members describe as the “rationale-shaped hole at the heart of forecasting.”⁵

Beyond commercial applications, FutureSearch has contributed to AI safety discourse through its work on AGI timeline forecasting and evaluations of frontier AI labs’ safety practices. The company participated in the “AI 2027” report analysis, examining scenarios where superhuman AI capabilities could emerge by 2027 and assessing which lab policies might mitigate existential risks.⁶

History and Founding

FutureSearch was established in 2023 in San Bruno, California, shortly after the release of GPT-4.⁷ The founding team brought deep expertise from the forecasting community: Dan Schwarz previously served as CTO at Metaculus and worked as a Senior Software Engineer at Google and Waymo, where he created Google’s internal prediction market.⁸ Lawrence Phillips, the co-founder and CTO, led the AI team at Metaculus and held machine learning positions at Cambridge Consultants, GTN, Jawbone Health, and GSK.⁹

The company’s early team included Daniel Hnyk (Founding Software Engineer, former Director of Engineering at GWI and Metaculus), Peter Mühlbacher (Founding Research Scientist), and Christoph Sträter (Research Scientist with a PhD in Theoretical Physics from the Max Planck Institute).¹⁰ This combination of forecasting experience, AI expertise, and research capabilities positioned FutureSearch to pursue hybrid human-AI approaches to prediction.

In December 2024, FutureSearch raised $5.79 million in a seed funding round led by Metaplanet Holdings.¹¹ The funds were allocated toward initial platform launch, refining the forecasting engine, expanding the engineering team, and implementing go-to-market strategies targeting enterprise customers. The company operates with approximately six employees as of early 2025.¹²

FutureSearch also received a $356,000 grant from Open Philanthropy through Good Ventures to support research on automating the generation of forecasting questions.¹³ This grant reflects the organization’s interest in advancing forecasting methodology alongside commercial product development.

Research and Capabilities

FutureSearch has published multiple research papers and benchmarks examining AI reasoning and forecasting capabilities. Key publications include “Bench to the Future: A Pastcasting Benchmark for Forecasting Agents” (May 2025) and “Deep Research Bench: Evaluating AI Web ‘Search’ and ‘Research’ Agents” (May 2025), along with a corresponding leaderboard launched in June 2025.¹⁴ These benchmarks evaluate how AI systems perform on research tasks requiring web search, information synthesis, and predictive judgment.

The company has also produced applied forecasts demonstrating its methodology. In June 2024, FutureSearch generated forecasts for OpenAI’s business metrics that proved notably accurate when OpenAI disclosed figures in September 2024.¹⁵ The platform predicted 7.7 million ChatGPT Plus subscribers and approximately $500 million in API Annual Recurring Revenue (ARR), contrasting with other industry estimates that claimed $1 billion API ARR. When OpenAI announced over 10 million subscribers later that year, FutureSearch’s methodology was validated. However, the company acknowledged overestimating ChatGPT Teams adoption at 900,000 subscribers when the actual figure was closer to 100,000.¹⁶

Additional research outputs include “A Guide for LLM Assisted Web Research” (June 2025), analysis tools like “Stockfisher” for financial forecasting (June 2025), and comprehensive forecasts such as “OpenAI’s Revenue in 2027.”¹⁷ The company has also investigated practical challenges in AI research, including analyses of AI hallucination trends and limitations in AI research agents.

CEO Dan Schwarz has stated that hybrid human-AI forecasting approaches outperform pure human forecasting, with the company’s tools enabling users to become more accurate in their predictions.¹⁸ This claim is supported by testimonials from research nonprofits using the platform.

AI Safety and AGI Timelines

FutureSearch has engaged with questions relevant to AI safety through its forecasting work on AGI timelines and frontier lab safety policies. The company contributed research to the “AI 2027” analysis examining scenarios where artificial general intelligence could emerge by 2027 through recursive self-improvement.¹⁹ In its 2026 revisitation of the AI 2027 scenario, FutureSearch predicted that achieving “Superhuman Coders”—a key milestone toward AGI—would take approximately three times longer than other AI futures scenarios suggested.²⁰

The company’s analysis highlighted Anthropic as the most safety-conscious frontier lab due to its Responsible Scaling Policy (RSP), which could reduce the probability of an “AI 2027”-like scenario by providing intervention mechanisms at high risk levels.²¹ This evaluation reflects FutureSearch’s interest in assessing how different organizational approaches to AI development might affect existential risk trajectories.

However, the analysis also noted concerns that if labs like Anthropic prioritize internal R&D speedup alongside safety measures, it could heighten risks from rapid AGI timelines. The forecasts emphasize the importance of lab policies that genuinely constrain development at critical thresholds rather than merely documenting safety efforts.²²

Technology and Products

FutureSearch’s core technology combines LLM-powered research agents with structured forecasting methodologies. The “everyrow” tool enables users to run LLM agents on every row of a dataset, facilitating large-scale automated research and data analysis.²³ This approach allows for systematic investigation of questions that would be prohibitively time-consuming for human researchers working alone.

The platform emphasizes legible reasoning alongside final probability estimates. This design responds to longstanding critiques in the forecasting community that valuable models and reasoning often remain hidden behind final numbers.²⁴ By surfacing the research process and causal models underlying predictions, FutureSearch aims to provide decision-makers with actionable insights rather than opaque probability distributions.

The company serves enterprise clients across multiple domains, with demonstrated applications in financial forecasting (stock fundamentals, revenue projections), geopolitical prediction, and AI capability assessments. Customers include xAI and everyrow, though specific client relationships and use cases are not publicly detailed.²⁵

Reception and Community Discussion

Within the effective altruism community, FutureSearch has generated discussion about the nature and value of forecasting as an intervention. Supporters appreciate the platform’s emphasis on producing legible models and reasons alongside forecasts, viewing this as addressing a critical gap where “forecasters produce reasons and models that are often more valuable than the final forecasts” but these insights are typically lost.²⁶ Some commentators position FutureSearch as contributing to a broader “rationality engine” that could improve decision-making beyond traditional prediction markets.

However, the platform has also faced framing critiques. Some EA community members argue that FutureSearch should be characterized as “in-depth research with a forecasting flavor” rather than “in-depth forecasting,” given that effective altruism typically associates forecasting with Tetlock-style superforecasting and aggregation of part-time predictors.²⁷ This framing disagreement reflects broader questions about what forecasting methodologies are most valuable for different applications.

More fundamentally, skeptics within the EA community question whether intensive forecasting represents an effective use of resources, viewing it as potentially “nerd-sniping activity”—addictive leaderboard competition rather than high-impact work.²⁸ These critiques apply to forecasting broadly rather than to FutureSearch specifically, but they contextualize debates about the platform’s value proposition.

Criticisms and Limitations

While FutureSearch’s forecasting methodology has demonstrated some successes, the company has acknowledged limitations in its predictions. The significant overestimate of ChatGPT Teams adoption (900,000 predicted versus approximately 100,000 actual subscribers) illustrates challenges in forecasting rapidly evolving markets with limited public data.²⁹

More broadly, research published by FutureSearch itself highlights challenges facing LLM-based forecasting and research tools. The company’s analysis has identified issues with overconfidence in AI-generated forecasts, problems with source selection and reliability, and tendencies toward hallucination in LLM outputs.³⁰ These documented limitations suggest that hybrid human-AI approaches, while promising, require careful design to avoid amplifying rather than mitigating human cognitive biases.

The platform’s small team size (approximately six employees) and early-stage funding may constrain its ability to compete with larger organizations developing AI research tools. Companies like Anthropic and OpenAI are developing research-oriented AI systems with substantially more resources, potentially limiting FutureSearch’s market position.

Key Uncertainties

Several important questions remain about FutureSearch’s trajectory and impact:

Forecasting accuracy at scale: While the company demonstrated success in its June 2024 OpenAI forecasts, it remains unclear whether this level of accuracy generalizes across different domains and time horizons. Systematic evaluation of the platform’s track record compared to alternative forecasting methods would clarify its value proposition.

Commercial viability: FutureSearch operates in a competitive landscape where larger AI companies are developing similar research and reasoning tools. Whether the company can sustain a profitable business model while maintaining forecasting quality remains uncertain. The relatively small seed funding round ($5.79M) suggests either capital efficiency or potential challenges in attracting larger investments.

Impact on AI safety discourse: FutureSearch’s contributions to AGI timeline forecasting and frontier lab evaluations have entered EA and AI safety discussions, but the ultimate influence of these forecasts on organizational decision-making and policy is unclear. Forecasts about AI timelines face inherent difficulties given the unprecedented nature of AGI development and limited historical precedents for analogous transitions.

Methodology transparency: While FutureSearch emphasizes legible reasoning, the extent to which its commercial forecasting methodology is transparent to external evaluators remains limited. Greater documentation of how human and AI contributions combine, how sources are weighted, and how uncertainty is quantified would enable better assessment of the platform’s epistemics.