AI training compute is growing at ~4-5× per year with algorithmic efficiency improving ~3× per year (halving effective compute cost every ~8 months...
Quantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to ...
Framework for prioritizing AI safety interventions by temporal urgency rather than impact alone, identifying four critical closing windows (2024-20...
Relative value framework comparing longtermist funding vehicles to a GiveWell reference. Key findings: (1) Coefficient Navigating TAI Fund is ~50x–...
Comprehensive quantitative analysis of 2026 hyperscaler AI capex (\$700B+ across 6 companies, 58% YoY increase), projecting \$5T cumulative through...
Quantitative assessment estimating AI provides modest knowledge uplift for bioweapons (1.0-1.2x per RAND 2024) but concerning evasion capabilities ...
Analysis finds AI safety research suffers 30-50% efficiency losses from industry dominance (60-70% of ~\$700M annually), with critical areas like m...
Multiplicative attack chain model estimates catastrophic bioweapons probability at 0.02-3.6%, with state actors (3.0%) dominating risk due to lab a...
Game-theoretic model identifying three equilibria for AI lab safety culture: racing-dominant (current state, S=0.25), safety-competitive (S>0.6), a...
Systematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing p...
Quantitative model of AI capability diffusion across 5 actor tiers, documenting compression from 24-36 months (2020) to 12-18 months (2024) with pr...
This model estimates alignment robustness degrades from 50-65% at GPT-4 level to 15-30% at 100x capability, with a critical 'alignment valley' at 1...
Analyzes when AI safety measures conflict with capabilities, finding most interventions impose 5-15% capability cost but RLHF actually improves usa...
Quantitative pipeline model finds only 200-400 ML researchers transition to safety work annually (far below 1,000-2,000 needed), with 60-75% blocke...
Mathematical framework showing independent AI safety layers with 20-60% individual failure rates can achieve 1-3% combined failure, but deceptive a...
Comprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion ri...
Decomposes deceptive alignment probability into five multiplicative conditions (mesa-optimization, misalignment, awareness, deception, survival) yi...
Quantitative framework estimating goal misgeneralization probability from 3.6% (superficial distribution shift) to 27.7% (extreme shift), with modi...
Taxonomizes 12 reward hacking modes with likelihood (20-90%) and severity scores, finding proxy exploitation affects 80-95% of current systems (low...
Identifies 5 AI risk cascade pathways with probabilities of 1-45% for catastrophic outcomes over 5-50 year timelines, finding racing dynamics as th...
Formal decomposition of power-seeking emergence into six quantified conditions, estimating current systems at 6.4% probability rising to 22% (2-4 y...
This model projects AI achieving fully autonomous cyber attack capability (Level 4) by 2029-2033, with current systems at ~50% progress and Level 3...
Probabilistic framework decomposing AI scheming risk into four multiplicative components (misalignment, situational awareness, instrumental rationa...
Systematic framework for detecting AI risks through 32 warning signs across 5 categories, finding critical indicators are 18-48 months from thresho...
This model quantifies how competitive pressure between AI labs reduces safety investment by 30-60% compared to coordinated scenarios and increases ...
This model systematically maps six pathways to corrigibility failure with quantified probability estimates (60-90% for advanced AI) and interventio...
Comprehensive framework mapping AI risk activation windows with specific probability assessments: current risks already active (disinformation 95%+...
Mathematical framework quantifying how AI risks compound beyond additive effects through four mechanisms (multiplicative probability, severity mult...
Analyzes institutional adaptation rates to AI, finding institutions change at 10-30% of needed rate per year while AI creates 50-200% annual gaps. ...
Quantitative model projects LAWS will proliferate 4-6x faster than nuclear weapons, reaching 60 nations by 2030 and non-state operational use by 20...
Quantifies AI safety talent shortage: current 300-800 unfilled positions (30-50% gap) with training pipelines producing only 220-450 researchers an...
Projects when AI-generated content becomes undetectable across modalities: text detection already at ~50% (random chance), images declining 5-10% a...
This model estimates AI disinformation's marginal electoral impact by decomposing the causal pathway from AI capability to vote shifts. Analysis fi...
Analyzes five thresholds where AI speed exceeds human control capacity (oversight, intervention, comprehension, cascade, recursive), finding T1-T2 ...
Models adversarial dynamics between AI generation and detection of synthetic content, projecting detection accuracy will fall from 65% (2024) to ~5...
Models institutional trust as a network contagion problem, finding cascades become irreversible below 30-40% trust thresholds and that AI multiplie...
Quantitative model finding current societal response capacity at 20-25% adequacy with 3-5 year institutional lag, requiring \$550M-1.1B/year invest...
Quantitative portfolio framework recommending AI safety resource allocation: 40-70% to misalignment, 15-35% to misuse, 10-25% to structural risks, ...
Game-theoretic analysis of AI competition traps showing universal cooperation probability drops from 81% (2 actors) to 21% (15 actors), with 5-10% ...
This model quantifies how AI assistance degrades human expertise through cascading feedback loops across individual (1-5 years), institutional (5-1...
Systematic analysis identifying racing dynamics as a hub risk enabling 8 downstream risks with 2-5x amplification, and showing compound risk scenar...
This framework maps beliefs about AI timelines (short/medium/long), alignment difficulty (hard/medium/tractable), and coordination feasibility (fea...
Using historical regime collapse data (military regimes: 9 years, single-party: 30 years) and evidence from 80+ countries adopting surveillance tec...
Carlsmith's framework decomposes AI existential risk into six conditional premises (timelines, incentives, alignment difficulty, power-seeking, dis...
Comprehensive risk framework for mesa-optimization estimating 10-70% emergence probability in frontier systems with 50-90% conditional misalignment...
System dynamics model analyzing feedback loops between media coverage, public concern, and AI policy using coupled differential equations. Finds 6-...
Analyzes whistleblower dynamics in AI labs using expected utility framework, estimating current barriers suppress 70-90% of critical safety informa...
Quantitative framework finding self-preservation converges in 95-99% of AI goal structures with 70-95% pursuit likelihood, while goal-content integ...
Analyzes how AI systems erode institutional trust through deepfakes, disinformation, and authentication collapse, finding trust erodes 3-10x faster...
Economic model analyzing AI safety research returns, recommending 3-10x funding increases from current ~\$500M/year to \$2-5B, with highest margina...
Analyzes autonomous weapons escalation risk through 10,000x speed differential between human decision-making (5-30 minutes) and machine cycles (0.2...
Quantifies how AI surveillance reduces freedom of expression through self-censorship mechanisms, estimating 50-70% reduction in dissent within mont...
Models AI sycophancy as multi-level feedback loops where validation increases user dependency (modeled with differential equations showing equilibr...
Models cyber offense-defense balance with AI, projecting 30-70% net attack success improvement (B_OD ratio 1.2-1.8, best estimate 1.45) driven by a...
This model analyzes how AI surveillance technologies diffuse to authoritarian regimes through commercial sales, development assistance, joint ventu...
Five-phase model tracking progression from AI augmentation to irreversible skill loss, finding humans decline to 50-70% baseline capability in Phas...
Analyzes when AI decisions become permanently locked-in across technical, economic, social, political, and existential dimensions, estimating 25% p...
Quantitative model estimating current US/UK regulatory capacity at 0.15-0.25 versus 0.4-0.6 threshold needed, with capacity ratio declining from 0....
Maps causal relationships between 22 AI safety parameters, identifying 7 feedback loops and 4 clusters. Finds epistemic-health and institutional-qu...
Projects authentication crisis threshold when detection accuracy falls to 50-55% (chance levels): audio by 2026-2027, images 2025-2027, video 2026-...