Longterm Wiki

forecastingcapabilitiesai-development

6.5k words

AI Timelines

Forecasts and debates about when transformative AI capabilities will be developed

x-riskcatastrophic-risklongtermism

Existential Risk from AI

Hypotheses concerning risks from advanced AI systems that some researchers believe could result in human extinction or permanent global catastrophe...

superintelligenceagix-risk

1.6k words

Superintelligence

AI systems with cognitive abilities vastly exceeding human intelligence

capabilitiesresearchforecasting

2.5k words

AI Scaling Laws

Empirical relationships between compute, data, parameters, and AI performance

alignmentscalable-oversightrlhf

5.7k words

AI Alignment

Comprehensive review of AI alignment approaches finding current methods (RLHF, Constitutional AI) show 75%+ effectiveness on measurable safety metr...

game-theorycoordinationcompetition

3.9k words

Multipolar Trap (AI Development)

Analysis of coordination failures in AI development using game theory, documenting how competitive dynamics between nations (US $109B vs China $9.3...

4.5k words

Optimistic Alignment Worldview

Comprehensive overview of the optimistic AI alignment worldview, estimating under 5% existential risk by 2100 based on beliefs that alignment is tr...

cybersecurityinformation-warfarecritical-infrastructure

4.3k words

Cyberweapons Risk

Comprehensive analysis showing AI-enabled cyberweapons represent a present, high-severity threat with GPT-4 exploiting 87% of one-day vulnerabiliti...

deployment-safetyapi-accessproliferation-control

3.5k words

Structured Access / API-Only

Structured access (API-only deployment) provides meaningful safety benefits through monitoring (80-95% detection rates), intervention capability, a...

human-agencyautomationdependence

AI-Induced Enfeeblement

Documents the gradual risk of humanity losing critical capabilities through AI dependency. Key findings: GPS users show 23% navigation decline (Nat...

Overview

4.7k words

Frontier AI Labs (Overview)

Comprehensive comparative overview of 7 major frontier AI labs covering safety frameworks, governance structures, competitive dynamics, and policy ...

nonprofit-governanceai-philanthropycorporate-structure

9.0k words

OpenAI Foundation

The OpenAI Foundation holds 26% equity (~$130B) in OpenAI Group PBC with governance control, but detailed analysis of board member incentives revea...

game-theorygovernanceinternational-cooperation

AI Governance Coordination Technologies

Comprehensive analysis of coordination mechanisms for AI safety showing racing dynamics could compress safety timelines by 2-5 years, with $500M+ g...

AI Model Steganography

Comprehensive analysis of AI steganography risks - systems hiding information in outputs to enable covert coordination or evade oversight. GPT-4 cl...

Overview

AI Governance & Policy (Overview)

Comprehensive reference overview of AI governance mechanisms across jurisdictions as of mid-2026, covering mandatory legislation, compute controls,...

biosecuritydual-use-researchx-risk

10.8k words

Bioweapons Risk

Comprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities...

Policy

4.5k words

US Executive Order on Safe, Secure, and Trustworthy AI

Executive Order 14110 (Oct 2023) established compute thresholds (10^26 FLOP general, 10^23 biological) and created AISI, but was revoked after 15 m...

thresholdsgovernanceus-aisi

human-ai-interactionai-controldecision-making

AI-Human Hybrid Systems

Hybrid AI-human systems achieve 15-40% error reduction across domains through six design patterns, with evidence from Meta (23% false positive redu...

resource-allocationfield-analysisfunding

2.8k words

AI Safety Intervention Portfolio

Provides a strategic framework for AI safety resource allocation by mapping 13+ interventions against 4 risk categories, evaluating each on ITN dim...

schemingdeception-detectionbehavioral-testing

3.3k words

Scheming & Deception Detection

Reviews empirical evidence that frontier models (o1, Claude 3.5, Gemini 1.5) exhibit in-context scheming capabilities at rates of 0.3-13%, includin...

containmentdefense-in-depthagent-safety

4.3k words

Sandboxing / Containment

Comprehensive analysis of AI sandboxing as defense-in-depth, synthesizing METR's 2025 evaluations (GPT-5 time horizon ~2h, capabilities doubling ev...

agent-safetycapability-restrictionsdefense-in-depth

3.9k words

Tool-Use Restrictions

Tool-use restrictions provide hard limits on AI agent capabilities through defense-in-depth approaches combining permissions, sandboxing, and human...

thresholdsregulationeu-ai-act

Compute Thresholds

Comprehensive analysis of compute thresholds (EU: 10^25 FLOP, US: 10^26 FLOP) as regulatory triggers for AI governance, documenting that algorithmi...

pausedevelopment-moratoriumpolitical-advocacy

5.3k words

Pause Advocacy

Comprehensive analysis of pause advocacy as an AI safety intervention, estimating 15-40% probability of meaningful policy implementation by 2030 wi...

safety-casesgovernancedeployment-decisions

4.1k words

AI Safety Cases

Safety cases are structured arguments adapted from nuclear/aviation to justify AI system safety, with UK AISI publishing templates in 2024 and 3 of...

elicitationsandbaggingscaffolding

3.5k words

Capability Elicitation

Capability elicitation—systematically discovering what AI models can actually do through scaffolding, prompting, and fine-tuning—reveals 2-10x perf...

Policy

4.6k words

Voluntary AI Safety Commitments

Comprehensive empirical analysis of voluntary AI safety commitments showing 53% mean compliance rate across 30 indicators (ranging from 13% for App...

self-regulationindustry-commitmentsresponsible-scaling

governancegovernment-oversightai-standards

4.8k words

US AI Safety Institute (now CAISI)

The US AI Safety Institute (AISI), established November 2023 within NIST with $10M budget (FY2025 request $82.7M), conducted pre-deployment evaluat...

2.8k words

AI-Enabled Untraceable Misuse

AI creates a "dual amplification" problem where the same systems that enable harmful actions also defeat attribution. False identity fraud rose 60%...

4.9k words

EA Epistemic Failures in the FTX Era

This page synthesizes post-FTX critiques of EA's epistemic and governance failures, identifying interlocking problems including donor hero-worship,...

scientific-integritypaper-millsreplication-crisis

1.9k words

Scientific Knowledge Corruption

Documents AI-enabled scientific fraud with evidence that 2-20% of submissions are from paper mills (field-dependent), 300,000+ fake papers exist, a...

computescalinginfrastructure

3.5k words

AI Compute Scaling Metrics

AI training compute is growing at ~4-5× per year with algorithmic efficiency improving ~3× per year (halving effective compute cost every ~8 months...

geopoliticsus-china-competitionopen-source-ai

4.1k words

AI Safety Multi-Actor Strategic Landscape

Analyzes AI risk through the lens of which actors develop TAI, finding actor identity may account for 40-60% of total risk variance, with detailed ...

adversarial-robustnessml-safetybenchmarking

3.2k words

FAR AI

FAR AI is an AI safety research nonprofit founded in July 2022 by Adam Gleave (CEO) and Karl Berzins (Co-founder & President). Based in Berkeley, C...

alignment-theorydeception-detectionbelief-extraction

2.5k words

Eliciting Latent Knowledge (ELK)

Comprehensive analysis of the Eliciting Latent Knowledge problem with quantified research metrics: ARC's prize contest received 197 proposals, awar...

international-coordinationmultilateral-treatiesai-safety-institutes

4.1k words

International Coordination Mechanisms

Comprehensive analysis of international AI coordination mechanisms shows growing but limited progress: 11-country AI Safety Institute network with ...

uncertainty-analysisscaling-lawscompute-governance

Deepfake Detection

Comprehensive analysis of deepfake detection showing best commercial detectors achieve 78-87% in-the-wild accuracy vs 96%+ in controlled settings, ...

Crux

2.6k words

AI Risk Critical Uncertainties Model

Identifies 35 high-leverage uncertainties in AI risk across compute (scaling breakdown at 10^26-10^30 FLOP), governance (10% P(US-China treaty by 2...

interpretabilityfeature-extractionmonosemanticity

3.2k words

Sparse Autoencoders (SAEs)

Comprehensive review of sparse autoencoders (SAEs) for mechanistic interpretability, covering Anthropic's 34M features from Claude 3 Sonnet (90% in...

weak-to-strongscalable-oversightsuperalignment

Weak-to-Strong Generalization

Weak-to-strong generalization tests whether weak supervisors can elicit good behavior from stronger AI systems. OpenAI's ICML 2024 experiments show...

human-agencyautonomymanipulation

1.8k words

Erosion of Human Agency

Comprehensive analysis of AI-driven agency erosion across domains: 42.3% of EU workers under algorithmic management (EWCS 2024), 70%+ of Americans ...

2.3k words

State Capacity and AI Governance

This article argues that government capacity to implement AI policy is critically lagging behind AI development, creating an existential risk throu...

interventionseffectivenessprioritization

4.2k words

AI Safety Intervention Effectiveness Matrix

Quantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding ($400M+) flows to R...

authoritarianismhuman-rightsdigital-repression

AI Authoritarian Tools

Comprehensive analysis documenting AI-enabled authoritarian tools across surveillance (350M+ cameras in China analyzing 25.9M faces daily per distr...

robustnessgeneralizationml-safety

3.6k words

AI Distributional Shift

Comprehensive analysis of distributional shift showing 40-45% accuracy drops when models encounter novel distributions (ObjectNet vs ImageNet), wit...

5.7k words

FTX Collapse: Lessons for EA Funding Resilience

The November 2022 collapse of FTX resulted in approximately $160M in committed EA grants that were not disbursed, organizational restructuring acro...

specification-gaminggoodharts-lawouter-alignment

Reward Hacking

Comprehensive analysis showing reward hacking occurs in 1-2% of OpenAI o3 task attempts, with 43x higher rates when scoring functions are visible. ...

4.7k words

Long-Timelines Technical Worldview

Comprehensive overview of the long-timelines worldview (20-40+ years to AGI, 5-20% P(doom)), arguing for foundational research over rushed solution...

1.9k words

Compute Supply Chain Actors

A well-structured, comprehensive overview of AI compute supply chain actors across six tiers, identifying key governance chokepoints (ASML, TSMC, N...

ai-governancecompute

Policy

4.2k words

US AI Chip Export Controls

Comprehensive empirical analysis finds US chip export controls provide 1-3 year delays on Chinese AI development but face severe enforcement gaps (...

value-comparisonlongtermismeffective-altruism

2.5k words

Relative Longtermist Value Comparisons

Relative value framework comparing longtermist funding vehicles to a GiveWell reference. Key findings: (1) Coefficient Navigating TAI Fund is ~50x–...

prioritizationtimingstrategy

4.4k words

Intervention Timing Windows

Framework for prioritizing AI safety interventions by temporal urgency rather than impact alone, identifying four critical closing windows (2024-20...

political-advocacysuper-pacai-regulation

2.3k words

Leading the Future super PAC

Leading the Future represents a $125 million industry effort to prevent AI regulation through political spending, directly opposing AI safety advoc...

Internal

2.2k words

Citation Architecture: Current State & Unified Proposal

Technical architecture document analyzing the wiki's overlapping citation systems (remark-gfm footnotes, References component, CitationOverlay, Res...

Crux

6.4k words

AI Accident Risk Cruxes

Comprehensive survey of AI safety researcher disagreements on accident risks, quantifying probability ranges for mesa-optimization (15-55%), decept...

mesa-optimizationdeceptive-alignmentsituational-awareness

Research Area

3.1k words

AI Control

AI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, of...

ai-controlsafety-research

nonprofit-governancecorporate-structureai-safety

Anthropic Long-Term Benefit Trust

Comprehensive reference page on Anthropic's LTBT governance mechanism, covering legal structure, trustee composition, amendment processes, and crit...

deceptionsituational-awarenessstrategic-deception

5.1k words

Scheming

Scheming—strategic AI deception during training—has transitioned from theoretical concern to observed behavior across all major frontier models (o1...

moratoriumdevelopment-pausecoordination

2.0k words

Pause / Moratorium

Comprehensive analysis of pause/moratorium proposals finding they would provide very high safety benefits if implemented (buying time for safety re...