Longterm Wiki

deployment-safetyapi-accessproliferation-control

Structured Access / API-Only

Structured access (API-only deployment) provides meaningful safety benefits through monitoring (80-95% detection rates), intervention capability, a...

nonprofit-governanceai-philanthropycorporate-structure

9.0k words

OpenAI Foundation

The OpenAI Foundation holds 26% equity (~\$130B) in OpenAI Group PBC with governance control, but detailed analysis of board member incentives reve...

game-theorygovernanceinternational-cooperation

2.9k words

AI Governance Coordination Technologies

Comprehensive analysis of coordination mechanisms for AI safety showing racing dynamics could compress safety timelines by 2-5 years, with \$500M+ ...

biosecuritydual-use-researchx-risk

10.8k words

Bioweapons Risk

Comprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities...

thresholdsgovernanceus-aisi

4.5k words

US Executive Order on Safe, Secure, and Trustworthy AI

Executive Order 14110 (Oct 2023) established compute thresholds (10^26 FLOP general, 10^23 biological) and created AISI, but was revoked after 15 m...

resource-allocationfield-analysisfunding

2.8k words

AI Safety Intervention Portfolio

Provides a strategic framework for AI safety resource allocation by mapping 13+ interventions against 4 risk categories, evaluating each on ITN dim...

agent-safetycapability-restrictionsdefense-in-depth

3.9k words

Tool-Use Restrictions

Tool-use restrictions provide hard limits on AI agent capabilities through defense-in-depth approaches combining permissions, sandboxing, and human...

thresholdsregulationeu-ai-act

4.0k words

Compute Thresholds

Comprehensive analysis of compute thresholds (EU: 10^25 FLOP, US: 10^26 FLOP) as regulatory triggers for AI governance, documenting that algorithmi...

pausedevelopment-moratoriumpolitical-advocacy

5.3k words

Pause Advocacy

Comprehensive analysis of pause advocacy as an AI safety intervention, estimating 15-40% probability of meaningful policy implementation by 2030 wi...

safety-casesgovernancedeployment-decisions

4.1k words

AI Safety Cases

Safety cases are structured arguments adapted from nuclear/aviation to justify AI system safety, with UK AISI publishing templates in 2024 and 3 of...

elicitationsandbaggingscaffolding

Capability Elicitation

Capability elicitation—systematically discovering what AI models can actually do through scaffolding, prompting, and fine-tuning—reveals 2-10x perf...

self-regulationindustry-commitmentsresponsible-scaling

4.6k words

Voluntary AI Safety Commitments

Comprehensive empirical analysis of voluntary AI safety commitments showing 53% mean compliance rate across 30 indicators (ranging from 13% for App...

2.8k words

AI-Enabled Untraceable Misuse

AI creates a "dual amplification" problem where the same systems that enable harmful actions also defeat attribution. False identity fraud rose 60%...

governancegovernment-oversightai-standards

4.8k words

US AI Safety Institute

The US AI Safety Institute (AISI), established November 2023 within NIST with \$10M budget (FY2025 request \$82.7M), conducted pre-deployment evalu...

uncertainty-analysisscaling-lawscompute-governance

AI Risk Critical Uncertainties Model

Identifies 35 high-leverage uncertainties in AI risk across compute (scaling breakdown at 10^26-10^30 FLOP), governance (10% P(US-China treaty by 2...

international-coordinationmultilateral-treatiesai-safety-institutes

4.1k words

International Coordination Mechanisms

Comprehensive analysis of international AI coordination mechanisms shows growing but limited progress: 11-country AI Safety Institute network with ...

human-agencyautonomymanipulation

1.8k words

Erosion of Human Agency

Comprehensive analysis of AI-driven agency erosion across domains: 42.3% of EU workers under algorithmic management (EWCS 2024), 70%+ of Americans ...

5.7k words

FTX Collapse: Lessons for EA Funding Resilience

The November 2022 collapse of FTX resulted in approximately \$160M in committed EA grants that were not disbursed, organizational restructuring acr...

interventionseffectivenessprioritization

4.2k words

AI Safety Intervention Effectiveness Matrix

Quantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to ...

authoritarianismhuman-rightsdigital-repression

2.9k words

AI Authoritarian Tools

Comprehensive analysis documenting AI-enabled authoritarian tools across surveillance (350M+ cameras in China analyzing 25.9M faces daily per distr...

2.2k words

State Capacity and AI Governance

This article argues that government capacity to implement AI policy is critically lagging behind AI development, creating an existential risk throu...

mesa-optimizationdeceptive-alignmentsituational-awareness

6.4k words

AI Accident Risk Cruxes

Comprehensive survey of AI safety researcher disagreements on accident risks, quantifying probability ranges for mesa-optimization (15-55%), decept...

4.1k words

US AI Chip Export Controls

Comprehensive empirical analysis finds US chip export controls provide 1-3 year delays on Chinese AI development but face severe enforcement gaps (...

political-advocacysuper-pacai-regulation

2.3k words

Leading the Future super PAC

Leading the Future represents a \$125 million industry effort to prevent AI regulation through political spending, directly opposing AI safety advo...

prioritizationtimingstrategy

4.4k words

Intervention Timing Windows

Framework for prioritizing AI safety interventions by temporal urgency rather than impact alone, identifying four critical closing windows (2024-20...

upliftcomparisonbioweapons

7.9k words

AI Uplift Assessment Model

Quantitative assessment estimating AI provides modest knowledge uplift for bioweapons (1.0-1.2x per RAND 2024) but more substantial evasion capabil...

moratoriumdevelopment-pausecoordination

2.0k words

Pause / Moratorium

Comprehensive analysis of pause/moratorium proposals finding they would provide very high safety benefits if implemented (buying time for safety re...

evaluationsafety-testingdeployment-decisions

1.7k words

AI Evaluation

Comprehensive overview of AI evaluation methods spanning dangerous capability assessment, safety properties, and deception detection, with categori...

Event

3.3k words

Anthropic-Pentagon Standoff (2026)

Comprehensive analysis of the February 2026 confrontation between Anthropic and the US government. Triggered when Claude AI was used in the January...

ai-policymilitary-aigovernment

deceptionself-awarenessevaluations

3.6k words

Situational Awareness

Comprehensive analysis of situational awareness in AI systems, documenting that Claude 3 Opus fakes alignment 12% baseline (78% post-RL), 5 of 6 fr...

regulationstate-policyfrontier-models

2.5k words

California SB 53

California SB 53 represents the first U.S. state law specifically targeting frontier AI safety through transparency requirements, incident reportin...

computeinfrastructurecapex

6.0k words

Projecting Compute Spending

Comprehensive quantitative analysis of 2026 hyperscaler AI capex (\$700B+ across 6 companies, 58% YoY increase), projecting \$5T cumulative through...

constitutional-airlhfinterpretability

5.1k words

Anthropic

Comprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interp...

anthropicgovernancetrust-structure

2.4k words

Long-Term Benefit Trust (Anthropic)

Anthropic's Long-Term Benefit Trust represents an innovative but potentially limited governance mechanism where financially disinterested trustees ...

misusebioweaponscybersecurity

5.9k words

AI Misuse Risk Cruxes

Comprehensive analysis of AI misuse cruxes with quantified evidence across bioweapons (RAND bio study found no significant difference; novice uplif...

Historical

OpenClaw Matplotlib Incident (2026)

Detailed incident report of the February 2026 OpenClaw matplotlib case, where an autonomous AI agent published a personal attack blog post ~30-40 m...

AI Standards Development

Comprehensive analysis of AI standards bodies (ISO/IEC, IEEE, NIST, CEN-CENELEC) showing how voluntary technical standards become de facto requirem...

tool-useagenticcomputer-use

8.8k words

Agentic AI

Analysis of agentic AI capabilities and deployment challenges, documenting industry forecasts (40% of enterprise apps by 2026, \$199B market by 203...

interpretabilityscalable-oversightrlhf

3.8k words

Technical AI Safety Research

Technical AI safety research encompasses six major agendas (mechanistic interpretability, scalable oversight, AI control, evaluations, agent founda...

alignmentscalingtrajectories

3.2k words

Alignment Robustness Trajectory Model

This model estimates alignment robustness degrades from 50-65% at GPT-4 level to 15-30% at 100x capability, with a critical 'alignment valley' at 1...

alphafolddrug-discoveryscientific-ai

5.8k words

Scientific Research Capabilities

Comprehensive survey of AI scientific research capabilities across biology, chemistry, materials science, and automated research, documenting key b...

evaluationsdangerous-capabilitiesautonomous-replication

4.4k words

METR

METR conducts pre-deployment dangerous capability evaluations for frontier AI labs (OpenAI, Anthropic, Google DeepMind), testing autonomous replica...

equilibriumsafety-culturegame-theory

2.1k words

AI Safety Culture Equilibrium Model

Game-theoretic model identifying three equilibria for AI lab safety culture: racing-dominant (current state, S=0.25), safety-competitive (S>0.6), a...

foundation-modelstransformersscaling

8.5k words

Large Language Models

Comprehensive analysis of LLM capabilities showing rapid progress from GPT-2 (1.5B parameters, 2019) to GPT-5 and Gemini 2.5 (2025), with training ...

formal-verificationmathematical-guaranteesaria

2.5k words

Provable / Guaranteed Safe AI

Provable Safe AI uses formal verification to provide mathematical safety guarantees, with UK's ARIA investing £59M through 2028. Current verificati...

power-concentrationlock-inracing-dynamics

3.6k words

AI Structural Risk Cruxes

Analyzes 12 key uncertainties about AI structural risks across power concentration, coordination feasibility, and institutional adaptation. Provide...

verificationcoordinationepistemic-infrastructure

5.0k words

AI Safety Solution Cruxes

A comprehensive structured mapping of AI safety solution uncertainties across technical, alignment, governance, and agentic domains, using probabil...

tradeoffssafetycapabilities

5.8k words

Safety-Capability Tradeoff Model

Analyzes when AI safety measures conflict with capabilities, finding most interventions impose 5-15% capability cost but RLHF actually improves usa...

resource-allocationresearch-prioritiesoptimization

1.4k words

AI Safety Research Allocation Model

Analysis finds AI safety research suffers 30-50% efficiency losses from industry dominance (60-70% of ~\$700M annually), with critical areas like m...

computegovernanceconcentration

2.3k words

Compute Concentration

All six major AI infrastructure spenders (Amazon, Alphabet, Microsoft, Meta, Oracle, xAI) are US companies subject to CLOUD Act and FISA 702, givin...

corporate-safetysafety-teamsvoluntary-commitments

1.3k words

Corporate AI Safety Responses

Major AI labs invest \$300-500M annually in safety (5-10% of R&D) through responsible scaling policies and dedicated teams, but face 30-40% safety ...

openaigovernancenonprofit-structure

OpenAI Foundation Governance Paradox

The OpenAI Foundation holds Class N shares giving it exclusive power to appoint/remove all OpenAI Group PBC board members. However, 7 of 8 Foundati...

risk-interactionscompounding-riskssystems-thinking

AI Risk Interaction Matrix

Systematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing p...

risk-factordiffusioncontrol

1.9k words

AI Proliferation Risk Model

Quantitative model of AI capability diffusion across 5 actor tiers, documenting compression from 24-36 months (2020) to 12-18 months (2024) with pr...

transformerstraining-costsscheming

3.7k words

Large Language Models

Comprehensive assessment of LLM capabilities showing training costs growing 2.4x/year (\$78-191M for frontier models, though DeepSeek achieved near...

governanceregulationcompute-governance

3.9k words

Governance-Focused Worldview

This worldview argues governance/coordination is the bottleneck for AI safety (not just technical solutions), estimating 10-30% P(doom) by 2100. Ev...

Person

ai-safety-advocacyfuture-of-life-instituteai-pause

Max Tegmark

Comprehensive biographical profile of Max Tegmark covering his transition from cosmology to AI safety advocacy, his role founding the Future of Lif...

4.2k words

AI Safety Institutes (AISIs)

Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-dep...