Analysis of coordination failures in AI development using game theory, documenting how competitive dynamics between nations (US \$109B vs China \$9...
Structured access (API-only deployment) provides meaningful safety benefits through monitoring (80-95% detection rates), intervention capability, a...
The OpenAI Foundation holds 26% equity (~\$130B) in OpenAI Group PBC with governance control, but detailed analysis of board member incentives reve...
Comprehensive analysis of coordination mechanisms for AI safety showing racing dynamics could compress safety timelines by 2-5 years, with \$500M+ ...
Comprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities...
Executive Order 14110 (Oct 2023) established compute thresholds (10^26 FLOP general, 10^23 biological) and created AISI, but was revoked after 15 m...
Provides a strategic framework for AI safety resource allocation by mapping 13+ interventions against 4 risk categories, evaluating each on ITN dim...
Tool-use restrictions provide hard limits on AI agent capabilities through defense-in-depth approaches combining permissions, sandboxing, and human...
Comprehensive analysis of compute thresholds (EU: 10^25 FLOP, US: 10^26 FLOP) as regulatory triggers for AI governance, documenting that algorithmi...
Comprehensive analysis of pause advocacy as an AI safety intervention, estimating 15-40% probability of meaningful policy implementation by 2030 wi...
Safety cases are structured arguments adapted from nuclear/aviation to justify AI system safety, with UK AISI publishing templates in 2024 and 3 of...
Capability elicitation—systematically discovering what AI models can actually do through scaffolding, prompting, and fine-tuning—reveals 2-10x perf...
Comprehensive empirical analysis of voluntary AI safety commitments showing 53% mean compliance rate across 30 indicators (ranging from 13% for App...
AI creates a "dual amplification" problem where the same systems that enable harmful actions also defeat attribution. False identity fraud rose 60%...
The US AI Safety Institute (AISI), established November 2023 within NIST with \$10M budget (FY2025 request \$82.7M), conducted pre-deployment evalu...
Identifies 35 high-leverage uncertainties in AI risk across compute (scaling breakdown at 10^26-10^30 FLOP), governance (10% P(US-China treaty by 2...
Comprehensive analysis of international AI coordination mechanisms shows growing but limited progress: 11-country AI Safety Institute network with ...
Comprehensive analysis of AI-driven agency erosion across domains: 42.3% of EU workers under algorithmic management (EWCS 2024), 70%+ of Americans ...
The November 2022 collapse of FTX resulted in approximately \$160M in committed EA grants that were not disbursed, organizational restructuring acr...
Quantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to ...
Comprehensive analysis documenting AI-enabled authoritarian tools across surveillance (350M+ cameras in China analyzing 25.9M faces daily per distr...
This article argues that government capacity to implement AI policy is critically lagging behind AI development, creating an existential risk throu...
Comprehensive survey of AI safety researcher disagreements on accident risks, quantifying probability ranges for mesa-optimization (15-55%), decept...
Comprehensive empirical analysis finds US chip export controls provide 1-3 year delays on Chinese AI development but face severe enforcement gaps (...
Leading the Future represents a \$125 million industry effort to prevent AI regulation through political spending, directly opposing AI safety advo...
Framework for prioritizing AI safety interventions by temporal urgency rather than impact alone, identifying four critical closing windows (2024-20...
Quantitative assessment estimating AI provides modest knowledge uplift for bioweapons (1.0-1.2x per RAND 2024) but more substantial evasion capabil...
Comprehensive analysis of pause/moratorium proposals finding they would provide very high safety benefits if implemented (buying time for safety re...
Comprehensive overview of AI evaluation methods spanning dangerous capability assessment, safety properties, and deception detection, with categori...
Comprehensive analysis of the February 2026 confrontation between Anthropic and the US government. Triggered when Claude AI was used in the January...
Comprehensive analysis of situational awareness in AI systems, documenting that Claude 3 Opus fakes alignment 12% baseline (78% post-RL), 5 of 6 fr...
California SB 53 represents the first U.S. state law specifically targeting frontier AI safety through transparency requirements, incident reportin...
Comprehensive quantitative analysis of 2026 hyperscaler AI capex (\$700B+ across 6 companies, 58% YoY increase), projecting \$5T cumulative through...
Comprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interp...
Anthropic's Long-Term Benefit Trust represents an innovative but potentially limited governance mechanism where financially disinterested trustees ...
Comprehensive analysis of AI misuse cruxes with quantified evidence across bioweapons (RAND bio study found no significant difference; novice uplif...
Detailed incident report of the February 2026 OpenClaw matplotlib case, where an autonomous AI agent published a personal attack blog post ~30-40 m...
Comprehensive analysis of AI standards bodies (ISO/IEC, IEEE, NIST, CEN-CENELEC) showing how voluntary technical standards become de facto requirem...
Analysis of agentic AI capabilities and deployment challenges, documenting industry forecasts (40% of enterprise apps by 2026, \$199B market by 203...
Technical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent founda...
This model estimates alignment robustness degrades from 50-65% at GPT-4 level to 15-30% at 100x capability, with a critical 'alignment valley' at 1...
Comprehensive survey of AI scientific research capabilities across biology, chemistry, materials science, and automated research, documenting key b...
METR conducts pre-deployment dangerous capability evaluations for frontier AI labs (OpenAI, Anthropic, Google DeepMind), testing autonomous replica...
Game-theoretic model identifying three equilibria for AI lab safety culture: racing-dominant (current state, S=0.25), safety-competitive (S>0.6), a...
Comprehensive analysis of LLM capabilities showing rapid progress from GPT-2 (1.5B parameters, 2019) to GPT-5 and Gemini 2.5 (2025), with training ...
Provable Safe AI uses formal verification to provide mathematical safety guarantees, with UK's ARIA investing £59M through 2028. Current verificati...
Analyzes 12 key uncertainties about AI structural risks across power concentration, coordination feasibility, and institutional adaptation. Provide...
A comprehensive structured mapping of AI safety solution uncertainties across technical, alignment, governance, and agentic domains, using probabil...
Analyzes when AI safety measures conflict with capabilities, finding most interventions impose 5-15% capability cost but RLHF actually improves usa...
Analysis finds AI safety research suffers 30-50% efficiency losses from industry dominance (60-70% of ~\$700M annually), with critical areas like m...
All six major AI infrastructure spenders (Amazon, Alphabet, Microsoft, Meta, Oracle, xAI) are US companies subject to CLOUD Act and FISA 702, givin...
Major AI labs invest \$300-500M annually in safety (5-10% of R&D) through responsible scaling policies and dedicated teams, but face 30-40% safety ...
The OpenAI Foundation holds Class N shares giving it exclusive power to appoint/remove all OpenAI Group PBC board members. However, 7 of 8 Foundati...
Systematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing p...
Quantitative model of AI capability diffusion across 5 actor tiers, documenting compression from 24-36 months (2020) to 12-18 months (2024) with pr...
Comprehensive assessment of LLM capabilities showing training costs growing 2.4x/year (\$78-191M for frontier models, though DeepSeek achieved near...
This worldview argues governance/coordination is the bottleneck for AI safety (not just technical solutions), estimating 10-30% P(doom) by 2100. Ev...
Comprehensive biographical profile of Max Tegmark covering his transition from cosmology to AI safety advocacy, his role founding the Future of Lif...
Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-dep...
Analyzes two compute monitoring approaches: cloud KYC (implementable in 1-2 years, covers ~60% of frontier training via AWS/Azure/Google) and hardw...