Analysis of coordination failures in AI development using game theory, documenting how competitive dynamics between nations (US \$109B vs China \$9...
Structured access (API-only deployment) provides meaningful safety benefits through monitoring (80-95% detection rates), intervention capability, a...
The OpenAI Foundation holds 26% equity (~\$130B) in OpenAI Group PBC with governance control, but detailed analysis of board member incentives reve...
Comprehensive analysis of coordination mechanisms for AI safety showing racing dynamics could compress safety timelines by 2-5 years, with \$500M+ ...
Comprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities...
Provides a strategic framework for AI safety resource allocation by mapping 13+ interventions against 4 risk categories, evaluating each on ITN dim...
Tool-use restrictions provide hard limits on AI agent capabilities through defense-in-depth approaches combining permissions, sandboxing, and human...
Executive Order 14110 (Oct 2023) established compute thresholds (10^26 FLOP general, 10^23 biological) and created AISI, but was revoked after 15 m...
Comprehensive analysis of compute thresholds (EU: 10^25 FLOP, US: 10^26 FLOP) as regulatory triggers for AI governance, documenting that algorithmi...
Comprehensive analysis of pause advocacy as an AI safety intervention, estimating 15-40% probability of meaningful policy implementation by 2030 wi...
Safety cases are structured arguments adapted from nuclear/aviation to justify AI system safety, with UK AISI publishing templates in 2024 and 3 of...
Capability elicitation—systematically discovering what AI models can actually do through scaffolding, prompting, and fine-tuning—reveals 2-10x perf...
Comprehensive empirical analysis of voluntary AI safety commitments showing 53% mean compliance rate across 30 indicators (ranging from 13% for App...
AI creates a "dual amplification" problem where the same systems that enable harmful actions also defeat attribution. False identity fraud rose 60%...
The US AI Safety Institute (AISI), established November 2023 within NIST with \$10M budget (FY2025 request \$82.7M), conducted pre-deployment evalu...
Comprehensive analysis of international AI coordination mechanisms shows growing but limited progress: 11-country AI Safety Institute network with ...
Comprehensive analysis of AI-driven agency erosion across domains: 42.3% of EU workers under algorithmic management (EWCS 2024), 70%+ of Americans ...
Comprehensive analysis documenting AI-enabled authoritarian tools across surveillance (350M+ cameras in China analyzing 25.9M faces daily per distr...
Quantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to ...
Comprehensive empirical analysis finds US chip export controls provide 1-3 year delays on Chinese AI development but face severe enforcement gaps (...
Framework for prioritizing AI safety interventions by temporal urgency rather than impact alone, identifying four critical closing windows (2024-20...
Comprehensive analysis of key uncertainties determining optimal AI safety resource allocation across technical verification (25-40% believe AI dete...
The November 2022 collapse of FTX resulted in approximately \$160M in committed EA grants that were not disbursed, organizational restructuring acr...
Identifies 35 high-leverage uncertainties in AI risk across compute (scaling breakdown at 10^26-10^30 FLOP), governance (10% P(US-China treaty by 2...
Leading the Future represents a \$125 million industry effort to prevent AI regulation through political spending, directly opposing AI safety advo...
This article argues that government capacity to implement AI policy is critically lagging behind AI development, creating an existential risk throu...
AI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, of...
Comprehensive analysis of pause/moratorium proposals finding they would provide very high safety benefits if implemented (buying time for safety re...
Comprehensive overview of AI evaluation methods spanning dangerous capability assessment, safety properties, and deception detection, with categori...
California SB 53 represents the first U.S. state law specifically targeting frontier AI safety through transparency requirements, incident reportin...
Comprehensive survey of AI safety researcher disagreements on accident risks, quantifying probability ranges for mesa-optimization (15-55%), decept...
Comprehensive quantitative analysis of 2026 hyperscaler AI capex (\$700B+ across 6 companies, 58% YoY increase), projecting \$5T cumulative through...
Comprehensive analysis of situational awareness in AI systems, documenting that Claude 3 Opus fakes alignment 12% baseline (78% post-RL), 5 of 6 fr...
Comprehensive analysis of AI standards bodies (ISO/IEC, IEEE, NIST, CEN-CENELEC) showing how voluntary technical standards become de facto requirem...
Comprehensive analysis of the February 2026 confrontation between Anthropic and the US government. Triggered when Claude AI was used in the January...
Anthropic's Long-Term Benefit Trust represents an innovative but potentially limited governance mechanism where financially disinterested trustees ...
Quantitative assessment estimating AI provides modest knowledge uplift for bioweapons (1.0-1.2x per RAND 2024) but concerning evasion capabilities ...
Analyzes 12 key uncertainties about AI structural risks across power concentration, coordination feasibility, and institutional adaptation. Provide...
Technical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent founda...
Detailed incident report of the February 2026 OpenClaw matplotlib case, where an autonomous AI agent published a personal attack blog post ~30-40 m...
Analysis finds AI safety research suffers 30-50% efficiency losses from industry dominance (60-70% of ~\$700M annually), with critical areas like m...
METR conducts pre-deployment dangerous capability evaluations for frontier AI labs (OpenAI, Anthropic, Google DeepMind), testing autonomous replica...
Game-theoretic model identifying three equilibria for AI lab safety culture: racing-dominant (current state, S=0.25), safety-competitive (S>0.6), a...
Systematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing p...
Provable Safe AI uses formal verification to provide mathematical safety guarantees, with UK's ARIA investing £59M through 2028. Current verificati...
Quantitative model of AI capability diffusion across 5 actor tiers, documenting compression from 24-36 months (2020) to 12-18 months (2024) with pr...
Analysis of agentic AI capabilities and deployment challenges, documenting industry forecasts (40% of enterprise apps by 2026, \$199B market by 203...
Comprehensive survey of AI scientific research capabilities across biology, chemistry, materials science, and automated research, documenting key b...
This model estimates alignment robustness degrades from 50-65% at GPT-4 level to 15-30% at 100x capability, with a critical 'alignment valley' at 1...
Analyzes when AI safety measures conflict with capabilities, finding most interventions impose 5-15% capability cost but RLHF actually improves usa...
Major AI labs invest \$300-500M annually in safety (5-10% of R&D) through responsible scaling policies and dedicated teams, but face 30-40% safety ...
Comprehensive analysis of 13 AI misuse cruxes with quantified evidence showing mixed uplift (RAND bio study found no significant difference, but cy...
Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-dep...
The OpenAI Foundation holds Class N shares giving it exclusive power to appoint/remove all OpenAI Group PBC board members. However, 7 of 8 Foundati...
Analyzes two compute monitoring approaches: cloud KYC (implementable in 1-2 years, covers ~60% of frontier training via AWS/Azure/Google) and hardw...
All six major AI infrastructure spenders (Amazon, Alphabet, Microsoft, Meta, Oracle, xAI) are US companies subject to CLOUD Act and FISA 702, givin...
Comprehensive assessment of LLM capabilities showing training costs growing 2.4x/year (\$78-191M for frontier models, though DeepSeek achieved near...
Quantitative pipeline model finds only 200-400 ML researchers transition to safety work annually (far below 1,000-2,000 needed), with 60-75% blocke...
Mathematical framework showing independent AI safety layers with 20-60% individual failure rates can achieve 1-3% combined failure, but deceptive a...
This worldview argues governance/coordination is the bottleneck for AI safety (not just technical solutions), estimating 10-30% P(doom) by 2100. Ev...