Forecasts and debates about when transformative AI capabilities will be developed
Hypotheses concerning risks from advanced AI systems that some researchers believe could result in human extinction or permanent global catastrophe
AI systems with cognitive abilities vastly exceeding human intelligence
Empirical relationships between compute, data, parameters, and AI performance
Comprehensive overview of the optimistic AI alignment worldview, estimating under 5% existential risk by 2100 based on beliefs that alignment is tr...
This page synthesizes post-FTX critiques of EA's epistemic and governance failures, identifying interlocking problems including donor hero-worship,...
AI training compute is growing at ~4-5× per year with algorithmic efficiency improving ~3× per year (halving effective compute cost every ~8 months...
Quantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to ...
Comprehensive overview of the long-timelines worldview (20-40+ years to AGI, 5-20% P(doom)), arguing for foundational research over rushed solution...
Framework for prioritizing AI safety interventions by temporal urgency rather than impact alone, identifying four critical closing windows (2024-20...
The November 2022 collapse of FTX resulted in approximately \$160M in committed EA grants that were not disbursed, organizational restructuring acr...
Identifies 35 high-leverage uncertainties in AI risk across compute (scaling breakdown at 10^26-10^30 FLOP), governance (10% P(US-China treaty by 2...
This article argues that government capacity to implement AI policy is critically lagging behind AI development, creating an existential risk throu...
Relative value framework comparing longtermist funding vehicles to a GiveWell reference. Key findings: (1) Coefficient Navigating TAI Fund is ~50x–...
Comprehensive survey of AI safety researcher disagreements on accident risks, quantifying probability ranges for mesa-optimization (15-55%), decept...
Comprehensive quantitative analysis of 2026 hyperscaler AI capex (\$700B+ across 6 companies, 58% YoY increase), projecting \$5T cumulative through...
Anthropic's Long-Term Benefit Trust represents an innovative but potentially limited governance mechanism where financially disinterested trustees ...
Quantitative assessment estimating AI provides modest knowledge uplift for bioweapons (1.0-1.2x per RAND 2024) but concerning evasion capabilities ...
Analyzes 12 key uncertainties about AI structural risks across power concentration, coordination feasibility, and institutional adaptation. Provide...
Technical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent founda...
A comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adve...
Detailed incident report of the February 2026 OpenClaw matplotlib case, where an autonomous AI agent published a personal attack blog post ~30-40 m...
Analysis finds AI safety research suffers 30-50% efficiency losses from industry dominance (60-70% of ~\$700M annually), with critical areas like m...
Multiplicative attack chain model estimates catastrophic bioweapons probability at 0.02-3.6%, with state actors (3.0%) dominating risk due to lab a...
Game-theoretic model identifying three equilibria for AI lab safety culture: racing-dominant (current state, S=0.25), safety-competitive (S>0.6), a...
Systematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing p...
Provable Safe AI uses formal verification to provide mathematical safety guarantees, with UK's ARIA investing £59M through 2028. Current verificati...
Quantitative model of AI capability diffusion across 5 actor tiers, documenting compression from 24-36 months (2020) to 12-18 months (2024) with pr...
This model estimates alignment robustness degrades from 50-65% at GPT-4 level to 15-30% at 100x capability, with a critical 'alignment valley' at 1...
Analyzes when AI safety measures conflict with capabilities, finding most interventions impose 5-15% capability cost but RLHF actually improves usa...
Comprehensive analysis of 13 AI misuse cruxes with quantified evidence showing mixed uplift (RAND bio study found no significant difference, but cy...
The OpenAI Foundation holds Class N shares giving it exclusive power to appoint/remove all OpenAI Group PBC board members. However, 7 of 8 Foundati...
Comprehensive assessment of LLM capabilities showing training costs growing 2.4x/year (\$78-191M for frontier models, though DeepSeek achieved near...
Quantitative pipeline model finds only 200-400 ML researchers transition to safety work annually (far below 1,000-2,000 needed), with 60-75% blocke...
Structures 9 epistemic cruxes determining AI safety prioritization strategy, with probabilistic analysis showing detection-generation arms race cur...
Mathematical framework showing independent AI safety layers with 20-60% individual failure rates can achieve 1-3% combined failure, but deceptive a...
This worldview argues governance/coordination is the bottleneck for AI safety (not just technical solutions), estimating 10-30% P(doom) by 2100. Ev...
Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion ri...
Comprehensive comparison of major AI safety research agendas (\$100M+ Anthropic, \$50M+ DeepMind, \$5-10M nonprofits) with detailed funding, team s...
Decomposes deceptive alignment probability into five multiplicative conditions (mesa-optimization, misalignment, awareness, deception, survival) yi...
A comprehensive structured mapping of AI safety solution uncertainties across technical, alignment, governance, and agentic domains, using probabil...
Model organisms of misalignment is a research agenda creating controlled AI systems exhibiting specific alignment failures as testbeds. Recent work...
Quantitative framework estimating goal misgeneralization probability from 3.6% (superficial distribution shift) to 27.7% (extreme shift), with modi...
Comprehensive analysis of AI governance mechanisms estimating 30-50% probability of meaningful regulation by 2027 and 5-25% x-risk reduction potent...
Taxonomizes 12 reward hacking modes with likelihood (20-90%) and severity scores, finding proxy exploitation affects 80-95% of current systems (low...
Analyzes how AI policy priorities shift under 1-5 year timelines to transformative AI, arguing that interventions requiring less than 2 years (lab ...
Identifies 5 AI risk cascade pathways with probabilities of 1-45% for catastrophic outcomes over 5-50 year timelines, finding racing dynamics as th...
Formal decomposition of power-seeking emergence into six quantified conditions, estimating current systems at 6.4% probability rising to 22% (2-4 y...
Quantifies the capability-alignment race showing capabilities currently ~3 years ahead of alignment readiness, with gap widening at 0.5 years/year ...
This model projects AI achieving fully autonomous cyber attack capability (Level 4) by 2029-2033, with current systems at ~50% progress and Level 3...
Probabilistic framework decomposing AI scheming risk into four multiplicative components (misalignment, situational awareness, instrumental rationa...
Systematic framework for detecting AI risks through 32 warning signs across 5 categories, finding critical indicators are 18-48 months from thresho...
This model quantifies how competitive pressure between AI labs reduces safety investment by 30-60% compared to coordinated scenarios and increases ...
Comprehensive synthesis of skeptical arguments against AI x-risk from prominent researchers (LeCun, Marcus, Ng, Brooks), concluding x-risk probabil...
Valuation analysis updated for Series G (Feb 2026). Anthropic raised \$30B at \$380B post-money with \$19B run-rate revenue, yielding ~27x multiple...
This model systematically maps six pathways to corrigibility failure with quantified probability estimates (60-90% for advanced AI) and interventio...
Comprehensive framework mapping AI risk activation windows with specific probability assessments: current risks already active (disinformation 95%+...
Mathematical framework quantifying how AI risks compound beyond additive effects through four mechanisms (multiplicative probability, severity mult...
Comprehensive formal argument that AI poses 5-14% median extinction risk by 2100 (per 2,788 researcher survey), structured around four premises: ca...
Analyzes institutional adaptation rates to AI, finding institutions change at 10-30% of needed rate per year while AI creates 50-200% annual gaps. ...