Forecasts and debates about when transformative AI capabilities will be developed
Hypotheses concerning risks from advanced AI systems that some researchers believe could result in human extinction or permanent global catastrophe...
AI systems with cognitive abilities vastly exceeding human intelligence
Empirical relationships between compute, data, parameters, and AI performance
Comprehensive overview of the optimistic AI alignment worldview, estimating under 5% existential risk by 2100 based on beliefs that alignment is tr...
Comprehensive analysis of compute thresholds (EU: 10^25 FLOP, US: 10^26 FLOP) as regulatory triggers for AI governance, documenting that algorithmi...
This page synthesizes post-FTX critiques of EA's epistemic and governance failures, identifying interlocking problems including donor hero-worship,...
AI training compute is growing at ~4-5× per year with algorithmic efficiency improving ~3× per year (halving effective compute cost every ~8 months...
Identifies 35 high-leverage uncertainties in AI risk across compute (scaling breakdown at 10^26-10^30 FLOP), governance (10% P(US-China treaty by 2...
Comprehensive analysis of international AI coordination mechanisms shows growing but limited progress: 11-country AI Safety Institute network with ...
This article argues that government capacity to implement AI policy is critically lagging behind AI development, creating an existential risk throu...
Quantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding ($400M+) flows to R...
The November 2022 collapse of FTX resulted in approximately $160M in committed EA grants that were not disbursed, organizational restructuring acro...
Comprehensive overview of the long-timelines worldview (20-40+ years to AGI, 5-20% P(doom)), arguing for foundational research over rushed solution...
Relative value framework comparing longtermist funding vehicles to a GiveWell reference. Key findings: (1) Coefficient Navigating TAI Fund is ~50x–...
Comprehensive survey of AI safety researcher disagreements on accident risks, quantifying probability ranges for mesa-optimization (15-55%), decept...
Framework for prioritizing AI safety interventions by temporal urgency rather than impact alone, identifying four critical closing windows (2024-20...
Quantitative assessment estimating AI provides modest knowledge uplift for bioweapons (1.0-1.2x per RAND 2024) but more substantial evasion capabil...
Comprehensive analysis of pause/moratorium proposals finding they would provide very high safety benefits if implemented (buying time for safety re...
Comprehensive quantitative analysis of 2026 hyperscaler AI capex ($700B+ across 6 companies, 58% YoY increase), projecting $5T cumulative through 2...
Multiplicative attack chain model estimates catastrophic bioweapons probability at 0.02-3.6%, with state actors (3.0%) showing highest estimated ri...
A comprehensive taxonomy of alignment difficulty arguments spanning specification problems, inner alignment failures, verification limits, and adve...
Provable Safe AI uses formal verification to provide mathematical safety guarantees, with UK's ARIA investing £59M through 2028. Current verificati...
Anthropic's Long-Term Benefit Trust represents an innovative but potentially limited governance mechanism where financially disinterested trustees ...
Comprehensive analysis of AI misuse cruxes with quantified evidence across bioweapons (RAND bio study found no significant difference; novice uplif...
Detailed incident report of the February 2026 OpenClaw matplotlib case, where an autonomous AI agent published a personal attack blog post ~30-40 m...
Comprehensive analysis of AI standards bodies (ISO/IEC, IEEE, NIST, CEN-CENELEC) showing how voluntary technical standards become de facto requirem...
Technical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent founda...
This model estimates alignment robustness degrades from 50-65% at GPT-4 level to 15-30% at 100x capability, with a critical 'alignment valley' at 1...
Game-theoretic model identifying three equilibria for AI lab safety culture: racing-dominant (current state, S=0.25), safety-competitive (S>0.6), a...
Analyzes 12 key uncertainties about AI structural risks across power concentration, coordination feasibility, and institutional adaptation. Provide...
A comprehensive structured mapping of AI safety solution uncertainties across technical, alignment, governance, and agentic domains, using probabil...
Analyzes when AI safety measures conflict with capabilities, finding most interventions impose 5-15% capability cost but RLHF actually improves usa...
Analysis finds AI safety research suffers 30-50% efficiency losses from industry dominance (60-70% of ~$700M annually), with critical areas like mu...
The OpenAI Foundation holds Class N shares giving it exclusive power to appoint/remove all OpenAI Group PBC board members. However, 7 of 8 Foundati...
Systematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing p...
Quantitative model of AI capability diffusion across 5 actor tiers, documenting compression from 24-36 months (2020) to 12-18 months (2024) with pr...
Comprehensive assessment of LLM capabilities showing training costs growing 2.4x/year ($78-191M for frontier models, though DeepSeek achieved near-...
Structures 9 epistemic cruxes determining AI safety prioritization strategy, with probabilistic analysis showing detection-generation arms race cur...
Valuation analysis updated March 2026. Series G closed at $380B (Feb 2026) with $14B run-rate; by March 2026, secondary/derivatives markets price A...
This worldview argues governance/coordination is the bottleneck for AI safety (not just technical solutions), estimating 10-30% P(doom) by 2100. Ev...
Model organisms of misalignment is a research agenda creating controlled AI systems exhibiting specific alignment failures as testbeds. Recent work...
Quantitative pipeline model finds only 200-400 ML researchers transition to safety work annually (far below 1,000-2,000 needed), with 60-75% blocke...
Mathematical framework showing independent AI safety layers with 20-60% individual failure rates can achieve 1-3% combined failure, but deceptive a...
Quantifies the capability-alignment race showing capabilities currently ~3 years ahead of alignment readiness, with gap widening at 0.5 years/year ...
Comprehensive analysis of AI governance mechanisms estimating 30-50% probability of meaningful regulation by 2027 and 5-25% x-risk reduction potent...
Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion ri...
Analyzes how AI policy priorities shift under 1-5 year timelines to transformative AI, arguing that interventions requiring less than 2 years (lab ...
Comprehensive comparison of major AI safety research agendas ($100M+ Anthropic, $50M+ DeepMind, $5-10M nonprofits) with detailed funding, team size...
Decomposes deceptive alignment probability into five multiplicative conditions (mesa-optimization, misalignment, awareness, deception, survival) yi...
Comprehensive analysis of international AI compute governance finds 10-25% chance of meaningful regimes by 2035, but potential for 30-60% reduction...
Formal decomposition of power-seeking emergence into six quantified conditions, estimating current systems at 6.4% probability rising to 22% (2-4 y...
Probabilistic framework decomposing AI scheming risk into four multiplicative components (misalignment, situational awareness, instrumental rationa...
Quantitative framework estimating goal misgeneralization probability from 3.6% (superficial distribution shift) to 27.7% (extreme shift), with modi...
This model estimates AI's marginal electoral impact across three vectors — disinformation influence, infrastructure attacks, and voter suppression....
Taxonomizes 12 reward hacking modes with likelihood (20-90%) and severity scores, finding proxy exploitation affects 80-95% of current systems (low...
Surveys US legal authority (DPA, IEEPA, CLOUD Act, FISA 702) over $700B+ in commercial AI infrastructure concentrated in 5-6 companies, concluding ...
Comprehensive tracking of US state AI legislation shows explosive growth from ~40 bills in 2019 to 1,080+ in 2025, with only 11% passage rate but r...
This model systematically maps six pathways to corrigibility failure with quantified probability estimates (60-90% for advanced AI) and interventio...
Identifies 5 AI risk cascade pathways with probabilities of 1-45% for catastrophic outcomes over 5-50 year timelines, finding racing dynamics as th...