Longterm Wiki
Navigation
Updated 2026-03-13HistoryData
Page StatusResponse
Edited today2.8k words2 backlinksUpdated every 3 weeksDue in 3 weeks
91QualityComprehensive61ImportanceUseful49ResearchLow
Summary

Provides a strategic framework for AI safety resource allocation by mapping 13+ interventions against 4 risk categories, evaluating each on ITN dimensions, and identifying portfolio gaps (epistemic resilience severely neglected, technical work over-concentrated in frontier labs). Total field investment ~\$650M annually with 1,100 FTEs (21% annual growth), but 85% of external funding from 5 sources and safety/capabilities ratio at only 0.5-1.3%. Recommends rebalancing from very high RLHF investment toward evaluations (very high priority), AI control and compute governance (both high priority), with epistemic resilience increasing from very low to medium allocation.

Content10/13
LLM summaryScheduleEntityEdit history1Overview
Tables14/ ~11Diagrams1/ ~1Int. links29/ ~23Ext. links54/ ~14Footnotes0/ ~9References19/ ~9Quotes0Accuracy0RatingsN:7 R:7.5 A:8 C:7.5Backlinks2
Change History1
Extract wiki proposals as structured data#1414 weeks ago

Created two new data layers: 1. **Interventions** (broad categories): Extended `Intervention` schema with risk coverage matrix, ITN prioritization, funding data. Created `data/interventions.yaml` with 14 broad intervention categories. `InterventionCard`/`InterventionList` components. 2. **Proposals** (narrow, tactical): New `Proposal` data type for specific, speculative, actionable items extracted from wiki pages. Created `data/proposals.yaml` with 27 proposals across 6 domains (philanthropic, financial, governance, technical, biosecurity, field-building). Each has cost/EV estimates, honest concerns, feasibility, stance (collaborative/adversarial). `ProposalCard`/`ProposalList` components. Post-review fixes: Fixed 13 incorrect wikiPageId E-codes in interventions.yaml (used numeric IDs instead of entity slugs). Added Intervention + Proposal to schema validator. Extracted shared badge color maps from 4 components into `badge-styles.ts`. Removed unused `client:load` prop and `fundingShare` destructure.

Issues1
Links42 links could use <R> components

AI Safety Intervention Portfolio

Approach

AI Safety Intervention Portfolio

Provides a strategic framework for AI safety resource allocation by mapping 13+ interventions against 4 risk categories, evaluating each on ITN dimensions, and identifying portfolio gaps (epistemic resilience severely neglected, technical work over-concentrated in frontier labs). Total field investment ~\$650M annually with 1,100 FTEs (21% annual growth), but 85% of external funding from 5 sources and safety/capabilities ratio at only 0.5-1.3%. Recommends rebalancing from very high RLHF investment toward evaluations (very high priority), AI control and compute governance (both high priority), with epistemic resilience increasing from very low to medium allocation.

Related
Organizations
Coefficient Giving
Policies
Responsible Scaling Policies (RSPs)
Concepts
InterpretabilityAI Evaluations
2.8k words · 2 backlinks

Quick Assessment

DimensionAssessmentEvidence
TractabilityMedium-HighVaries widely: evaluations (high), compute governance (high), international coordination (low). Coefficient Giving's 2025 RFP allocated $40M for technical safety research.
ScalabilityHighPortfolio approach scales across 4 risk categories and multiple timelines. AI Safety Field Growth Analysis shows 21% annual FTE growth rate.
Current MaturityMediumCore interventions established; significant gaps in epistemic resilience (less than 5% of portfolio) and post-incident recovery (under 1%).
Research Workforce≈1,100 FTEs600 technical + 500 non-technical AI safety FTEs in 2025, up from 400 total in 2022 (AI Safety Field Growth Analysis).
Time HorizonNear-LongNear-term (evaluations, control) complement long-term work (interpretability, governance). International AI Safety Report 2025 emphasizes urgency.
Funding Level$110-130M/year external2024 external funding. Early 2025 shows 40-50% acceleration with $67M committed through July. Internal lab spending adds $500-550M for ≈$650M total (Coefficient Giving analysis).
Funding Concentration85% from 5 sourcesCoefficient Giving: $63.6M (60%); Jaan Tallinn: $20M; Eric Schmidt: $10M; AI Safety Fund: $10M; FLI: $5M
Safety/Capabilities Ratio≈0.5-1.3%$600-650M safety vs $50B+ capabilities spending. FAS recommends 30% of compute for safety research.
SourceLink
Official Websitemop.wiki
Wikipediaen.wikipedia.org

Overview

This page provides a strategic view of the AI safety intervention landscape, analyzing how different interventions address different risk categories. Rather than examining interventions individually, this portfolio view helps identify coverage gaps, complementarities, and allocation priorities.

The intervention landscape can be divided into several categories: technical approaches (alignment, interpretability, control), governance mechanisms (legislation, compute governance, international coordination), field building (talent, funding, community), and resilience measures (epistemic security, economic adaptation). Each category has different tractability profiles, timelines, and risk coverage—understanding these tradeoffs is essential for strategic resource allocation.

An effective safety portfolio requires both breadth (covering diverse failure modes) and depth (sufficient investment in each area to achieve impact). The current portfolio shows significant concentration in certain areas (RLHF, capability evaluations) while other areas remain relatively neglected (epistemic resilience, international coordination).

Field Growth Trajectory

Metric20222025Growth RateNotes
Technical AI Safety FTEs30060021%/yearAI Safety Field Growth Analysis 2025
Non-Technical AI Safety FTEs10050071%/yearGovernance, policy, operations
Total AI Safety FTEs4001,10040%/yearField-wide compound growth
AI Safety Organizations≈50≈12024%/yearExponential growth since 2020
Capabilities FTEs (comparison)≈3,000≈15,00030-40%/yearOpenAI alone: 300 → 3,000

Critical Comparison: While AI safety workforce has grown substantially, capabilities research is growing 30-40% per year. The ratio of capabilities to safety researchers has remained roughly constant at 10-15:1, meaning the absolute gap continues to widen.

Top Research Categories (by FTEs):

  1. Miscellaneous technical AI safety research
  2. LLM safety
  3. Interpretability

Intervention Categories and Risk Coverage

Loading diagram...

Intervention by Risk Matrix

This matrix shows how strongly each major intervention addresses each risk category. Ratings are based on current evidence and expert assessments.

InterventionAccident RisksMisuse RisksStructural RisksEpistemic RisksPrimary Mechanism
InterpretabilityHighLowLow--Detect deception and misalignment in model internals
AI ControlHighMedium----External constraints regardless of AI intentions
EvaluationsHighMediumLow--Pre-deployment testing for dangerous capabilities
RLHF/Constitutional AIMediumMedium----Train models to follow human preferences
Scalable OversightMediumLow----Human supervision of superhuman systems
Compute GovernanceLowHighMedium--Hardware chokepoints limit access
Export ControlsLowHighMedium--Restrict adversary access to training compute
Responsible ScalingMediumMediumLow--Capability thresholds trigger safety requirements
International CoordinationLowMediumHigh--Reduce racing dynamics through agreements
AI Safety InstitutesMediumMediumMedium--Government capacity for evaluation and oversight
Field BuildingMediumLowMediumLowGrow talent pipeline and research capacity
Epistemic Security--LowLowHighProtect collective truth-finding capacity
Content Authentication--Medium--HighVerify authentic content in synthetic era

Legend: High = primary focus, addresses directly; Medium = secondary impact; Low = indirect or limited; -- = minimal relevance


Prioritization Framework

This framework evaluates interventions across the standard Importance-Tractability-Neglectedness (ITN) dimensions, with additional consideration for timeline fit and portfolio complementarity.

InterventionTractabilityImpact PotentialNeglectednessTimeline FitOverall Priority
InterpretabilityMediumHighLowLongHigh
AI ControlHighMedium-HighMediumNearVery High
EvaluationsHighMediumLowNearHigh
Compute GovernanceHighHighLowNearVery High
International CoordinationLowVery HighHighLongHigh
Field BuildingHighMediumMediumOngoingMedium-High
Epistemic ResilienceMediumMediumHighNear-LongMedium-High
Scalable OversightMedium-LowHighMediumLongMedium

Prioritization Rationale

Very High Priority:

  • AI Control scores highly because it provides near-term safety benefits (70-85% tractability for human-level systems) regardless of whether alignment succeeds. It represents a practical bridge during the transition period. Redwood Research received $1.2M for control research in 2024.
  • Compute Governance is one of few levers creating physical constraints on AI development. Hardware chokepoints exist, some measures are already implemented (EU AI Act compute thresholds, US export controls), and impact potential is substantial. GovAI produces leading research on compute governance mechanisms.

High Priority:

  • Interpretability is potentially essential if alignment proves difficult (only reliable way to detect sophisticated deception). MIT Technology Review named mechanistic interpretability a 2026 Breakthrough Technology. Anthropic's attribution graphs revealed hidden reasoning in Claude 3.5 Haiku. FAS recommends federal R&D funding through DARPA and NSF.
  • Evaluations provide measurable near-term impact and are already standard practice at major labs. Coefficient Giving launched an RFP for capability evaluations ($200K-$5M grants). METR partners with Anthropic and OpenAI on frontier model evaluations. NIST invested $20M in AI Economic Security Centers.
  • International Coordination has very high impact potential for addressing structural risks like racing dynamics, but low tractability given current geopolitical tensions. The International AI Safety Report 2025, led by Yoshua Bengio with 100+ authors from 30 countries, represents the largest global collaboration to date.

Medium-High Priority:

  • Field Building and Epistemic Resilience are relatively neglected meta-level interventions that multiply the effectiveness of direct technical and governance work. 80,000 Hours notes good funding opportunities in AI safety exist for qualified researchers.

Portfolio Gaps and Complementarities

Coverage Gaps

Analysis of the current intervention portfolio reveals several areas where coverage is thin:

Gap AreaCurrent InvestmentRisk ExposureRecommended Action
Epistemic RisksUnder 5% of portfolio ($3-5M/year)Epistemic collapse, reality fragmentationIncrease to 8-10% of portfolio; invest in content authentication and epistemic infrastructure
Long-term Structural Risks4-6% of portfolio; international coordination is low tractabilityLock-in, concentration of powerDevelop alternative coordination mechanisms; invest in governance research
Post-Incident RecoveryUnder 1% of portfolioAll risk categoriesDevelop recovery protocols and resilience measures; allocate 3-5% of portfolio
Misuse by State ActorsExport controls are primary lever; $5-10M in policy researchAuthoritarian tools, surveillanceResearch additional governance mechanisms; increase to $15-25M
Independent Evaluation Capacity70%+ of evals done by labs themselvesConflict of interest, verification gapsCoefficient Giving's eval RFP addresses this with $200K-$5M grants

Key Complementarities

Certain interventions work better together than in isolation:

Technical + Governance:

  • AI Evaluations inform Responsible Scaling Policies (RSPs) thresholds
  • Interpretability enables verification for
  • AI Control provides safety margin while governance matures

Near-term + Long-term:

  • Compute Governance buys time for Interpretability research
  • AI Evaluations identify near-term risks while Scalable Oversight develops
  • AI Safety Field Building and Community ensures capacity for future technical work

Prevention + Resilience:

  • Technical safety research aims to prevent failures
  • AI-Era Epistemic Security and economic resilience limit damage if prevention fails
  • Both are needed for robust defense-in-depth

Portfolio Funding Allocation

The following table estimates 2024 funding levels by intervention area and compares them to recommended allocations based on neglectedness and impact potential. Total external AI safety funding was approximately $110-130 million in 2024, with Coefficient Giving providing ~60% of this amount.

Intervention AreaEst. 2024 Funding% of TotalRecommended ShiftKey Funders
RLHF/Training Methods$15-35M≈25%Decrease to 20%Frontier labs (internal), academic grants
Interpretability$15-25M≈18%MaintainCoefficient Giving, Superalignment Fast Grants ($10M)
Evaluations & Evals Infrastructure$12-18M≈13%Increase to 20%CAIS ($1.5M), UK AISI, labs
AI Control Research$1-12M≈9%Increase to 15%Redwood Research ($1.2M), Anthropic
Compute Governance$1-10M≈7%Increase to 12%Government programs, policy organizations
Field Building & Talent$10-15M≈11%Maintain80,000 Hours, MATS, various fellowships
Governance & Policy$1-12M≈9%Increase to 12%Coefficient Giving policy grants, government initiatives
International Coordination$1-5M≈4%Increase to 8%UK/EU government initiatives (≈$14M total)
Epistemic Resilience$1-4M≈3%Increase to 8%Very few dedicated funders

2025 Funding Landscape Update

Funder2024 AllocationFocus AreasSource
Coefficient Giving$63.6MTechnical safety, governance, field building60% of external funding
Jaan Tallinn$20MLong-term alignment researchPersonal foundation
Eric Schmidt (Schmidt Sciences)$10MSafety benchmarking, adversarial evaluationQuick Market Pitch
AI Safety Fund$10MCollaborative research (Anthropic, Google, Microsoft, OpenAI)Frontier Model Forum
Future of Life Institute$5MSmaller grants, fellowshipsDiverse portfolio
Steven Schuurman Foundation€5M/yearVarious AI safety initiativesElastic co-founder
Total External$110-130M2024 estimate

2025 Trajectory: Early data (through July 2025) shows $67M already committed, putting the year on track to exceed 2024 totals by 40-50%.

Funding Gap Analysis

The funding landscape reveals several structural imbalances:

Gap TypeCurrent StateImpactRecommended Action
Climate vs AI safetyClimate philanthropy: ≈$1-15B; AI safety: ≈$130M≈100x disparity despite comparable catastrophic potentialIncrease AI safety funding to at least $100M-1B annually
Capabilities vs safety≈$100B in AI data center capex (2024) vs ≈$130M safety≈1500:1 ratioRedirect 0.5-1% of capabilities spending to safety
Funder concentrationCoefficient Giving: 60% of external fundingSingle point of failure; limits diversityDiversify funding sources; new initiatives like Humanity AI ($100M)
Talent pipelineOver-optimized for researchersShortage in governance, operations, advocacyExpand non-research talent programs

Resource Allocation Assessment

AreaCurrent AllocationRecommendedRationale
RLHF/TrainingVery HighHighDeployed at scale but limited effectiveness against deceptive alignment
InterpretabilityHighHighRapid progress; potential for fundamental breakthroughs
EvaluationsHighVery HighCritical for identifying dangerous capabilities pre-deployment
AI ControlMediumHighNear-term tractable; provides safety regardless of alignment
Compute GovernanceMediumHighOne of few physical levers; already showing policy impact
International CoordinationLowMediumLow tractability but very high stakes
Epistemic ResilienceVery LowMediumHighly neglected; addresses underserved risk category
Field BuildingMediumMediumMaintain current investment; returns are well-established

Investment Concentration Risks

The current portfolio shows several structural vulnerabilities:

Concentration TypeCurrent StateRiskMitigation
Funder concentrationCoefficient Giving provides ≈60% of external fundingStrategy changes affect entire fieldCultivate diverse funding sources
Geographic concentrationUS and UK receive majority of fundingLimited global coordination capacitySupport emerging hubs (Berlin, Canada, Australia)
Frontier lab dependenceMost technical safety at Anthropic, OpenAI, DeepMindConflicts of interest; limited independent verificationIncrease funding to MIRI ($1.1M), Redwood, ARC
Research over operationsPipeline over-optimized for researchersShortage of governance, advocacy, operations talentExpand non-research career paths
Technical over governanceTechnical ~60% vs governance ≈15% of fundingGovernance may be more neglected and tractableRebalance toward policy research
Prevention over resilienceMinimal investment in post-incident recoveryNo fallback if prevention failsDevelop recovery protocols

Strategic Considerations

Worldview Dependencies

Different beliefs about AI risk lead to different portfolio recommendations:

WorldviewPrioritizeDeprioritize
Alignment is very hardInterpretability, Control, International coordinationRLHF, Voluntary commitments
Misuse is the main riskCompute governance, Content authentication, LegislationInterpretability, Agent foundations
Short timelinesAI Control, Evaluations, Responsible scalingLong-term governance research
Racing dynamics dominateInternational coordination, Compute governanceUnilateral safety research
Epistemic collapse is likelyEpistemic security, Content authenticationTechnical alignment

Portfolio Robustness

A robust portfolio should satisfy the following criteria, which can help evaluate current gaps and guide future allocation:

Robustness CriterionCurrent StatusGap AssessmentTarget
Cover multiple failure modesAccident risks: 60% coverage; Misuse: 50%; Structural: 30%; Epistemic: under 15%Medium gap70%+ coverage across all categories
Prevention and resilience~95% prevention, ≈5% resilienceLarge gap80% prevention, 20% resilience
Near-term and long-term balance55% near-term (evals, control), 45% long-term (interpretability, governance)Small gapMaintain current balance
Independent research capacityFrontier labs: 70%+ of technical safety; Independents: under 30%Medium gap50/50 split between labs and independents
Support multiple worldviewsMost interventions robust across scenariosSmall gapMaintain
Geographic diversityUS/UK: 80%+ of funding; EU: 10%; ROW: under 10%Medium gapUS/UK: 60%, EU: 20%, ROW: 20%
Funder diversity5 funders provide 85% of external funding; Coefficient Giving alone provides 60%Large gapNo single funder greater than 25%

Key Sources

SourceTypeRelevance
Coefficient Giving Progress 2024Funder ReportPrimary data on AI safety funding levels and priorities
AI Safety Funding Situation OverviewAnalysisComprehensive breakdown of funding sources and gaps
AI Safety Needs More FundersPolicy BriefComparison to other catastrophic risk funding
AI Safety Field Growth Analysis 2025ResearchField growth metrics, 1,100 FTEs, 21% annual growth
International AI Safety Report 2025Global Report100+ authors, 30 countries, Yoshua Bengio lead
Future of Life AI Safety Index 2025Industry Assessment33 indicators across 6 domains for 7 leading companies
Coefficient Giving Technical AI Safety RFPGrant Program$40M allocation for technical safety research
Coefficient Giving Capability Evaluations RFPGrant Program$200K-$5M grants for evaluation infrastructure
America's AI Action Plan (July 2025)PolicyUS government AI priorities including evaluations ecosystem
Accelerating AI Interpretability (FAS)Policy BriefFederal funding recommendations for interpretability
80,000 Hours: AI RiskCareer GuidanceIntervention prioritization and neglectedness analysis
RLHF Limitations PaperResearchEvidence on limitations of current alignment methods
Carnegie AI Safety as Global Public GoodPolicy AnalysisInternational coordination challenges and research priorities
ITU Annual AI Governance Report 2025Global ReportAI governance landscape across nations

References

2AI Safety Field Growth Analysis 2025EA Forum·Stephen McAleese·2025

Comprehensive study tracking the expansion of technical and non-technical AI safety fields from 2010 to 2025. Documents growth from approximately 400 to 1,100 full-time equivalent researchers across both domains.

★★★☆☆
3International AI Safety Report 2025internationalaisafetyreport.org

The International AI Safety Report 2025 provides a global scientific assessment of general-purpose AI capabilities, risks, and potential management techniques. It represents a collaborative effort by 96 experts from 30 countries to establish a shared understanding of AI safety challenges.

5Open Philanthropy grants databaseopenphilanthropy.org

Open Philanthropy provides grants across multiple domains including global health, catastrophic risks, and scientific progress. Their focus spans technological, humanitarian, and systemic challenges.

6AI Safety Fundfrontiermodelforum.org

Open Philanthropy reviewed its philanthropic efforts in 2024, focusing on expanding partnerships, supporting AI safety research, and making strategic grants across multiple domains including global health and catastrophic risk reduction.

8Governance researchCentre for the Governance of AI·Government
★★★★☆

Mechanistic interpretability: 10 Breakthrough Technologies 2026 | MIT Technology Review You need to enable JavaScript to view this site.

10metr.orgMETR
★★★★☆
13An Overview of the AI Safety Funding Situation (LessWrong)LessWrong·Stephen McAleese·2023·Blog post

Analyzes AI safety funding from sources like Open Philanthropy, Survival and Flourishing Fund, and academic institutions. Estimates total global AI safety spending and explores talent versus funding constraints.

★★★☆☆
15EA Forum analysisEA Forum·Christopher Clay·2025·Blog post
16AI Alignment through RLHFarXiv·Adam Dahlgren Lindström et al.·2024·Paper
★★★☆☆
17FLI AI Safety Index Summer 2025Future of Life Institute

The FLI AI Safety Index Summer 2025 assesses leading AI companies' safety efforts, finding widespread inadequacies in risk management and existential safety planning. Anthropic leads with a C+ grade, while most companies score poorly across critical safety domains.

★★★☆☆

Related Pages

Top Related Pages

Safety Research

AI ControlScalable Oversight

Analysis

Short AI Timeline Policy ImplicationsAI Risk Portfolio AnalysisAI Safety Intervention Effectiveness MatrixIntervention Timing WindowsAI Safety Research Allocation Model

Approaches

AI-Era Epistemic SecurityAI Content AuthenticationAI Alignment

Policy

US AI Chip Export ControlsAI Safety Institutes (AISIs)Compute Governance

Key Debates

Technical AI Safety ResearchAI Safety Field Building and Community

Organizations

OpenAI

Concepts

RLHF

Other

Jaan TallinnYoshua Bengio