Skip to content

Intervention Portfolio

📋Page Status
Page Type:ResponseStyle Guide →Intervention/response page
Quality:91 (Comprehensive)
Importance:87.5 (High)
Last edited:2026-01-30 (2 days ago)
Words:3.1k
Structure:
📊 17📈 1🔗 71📚 528%Score: 14/15
LLM Summary:Provides a strategic framework for AI safety resource allocation by mapping 13+ interventions against 4 risk categories, evaluating each on ITN dimensions, and identifying portfolio gaps (epistemic resilience severely neglected, technical work over-concentrated in frontier labs). Total field investment ~$650M annually with 1,100 FTEs (21% annual growth), but 85% of external funding from 5 sources and safety/capabilities ratio at only 0.5-1.3%. Recommends rebalancing from very high RLHF investment toward evaluations (very high priority), AI control and compute governance (both high priority), with epistemic resilience increasing from very low to medium allocation.
Issues (1):
  • Links38 links could use <R> components
DimensionRatingEvidence
TractabilityMedium-HighVaries widely: evaluations (high), compute governance (high), international coordination (low). Open Philanthropy’s 2025 RFP allocated $40M for technical safety research.
ScalabilityHighPortfolio approach scales across 4 risk categories and multiple timelines. AI Safety Field Growth Analysis shows 21% annual FTE growth rate.
Current MaturityMediumCore interventions established; significant gaps in epistemic resilience (less than 5% of portfolio) and post-incident recovery (under 1%).
Research Workforce≈1,100 FTEs600 technical + 500 non-technical AI safety FTEs in 2025, up from 400 total in 2022 (AI Safety Field Growth Analysis).
Time HorizonNear-LongNear-term (evaluations, control) complement long-term work (interpretability, governance). International AI Safety Report 2025 emphasizes urgency.
Funding Level$110-130M/year external2024 external funding. Early 2025 shows 40-50% acceleration with $67M committed through July. Internal lab spending adds $500-550M for ≈$650M total (Coefficient Giving analysis).
Funding Concentration85% from 5 sourcesOpen Philanthropy: $63.6M (60%); Jaan Tallinn: $20M; Eric Schmidt: $10M; AI Safety Fund: $10M; FLI: $5M
Safety/Capabilities Ratio≈0.5-1.3%$600-650M safety vs $50B+ capabilities spending. FAS recommends 30% of compute for safety research.

This page provides a strategic view of the AI safety intervention landscape, analyzing how different interventions address different risk categories and improve key parameters in the AI Transition Model. Rather than examining interventions individually, this portfolio view helps identify coverage gaps, complementarities, and allocation priorities.

The intervention landscape can be divided into several categories: technical approaches (alignment, interpretability, control), governance mechanisms (legislation, compute governance, international coordination), field building (talent, funding, community), and resilience measures (epistemic security, economic adaptation). Each category has different tractability profiles, timelines, and risk coverage—understanding these tradeoffs is essential for strategic resource allocation.

An effective safety portfolio requires both breadth (covering diverse failure modes) and depth (sufficient investment in each area to achieve impact). The current portfolio shows significant concentration in certain areas (RLHF, capability evaluations) while other areas remain relatively neglected (epistemic resilience, international coordination).

Metric20222025Growth RateNotes
Technical AI Safety FTEs30060021%/yearAI Safety Field Growth Analysis 2025
Non-Technical AI Safety FTEs10050071%/yearGovernance, policy, operations
Total AI Safety FTEs4001,10040%/yearField-wide compound growth
AI Safety Organizations≈50≈12024%/yearExponential growth since 2020
Capabilities FTEs (comparison)≈3,000≈15,00030-40%/yearOpenAI alone: 300 → 3,000

Critical Comparison: While AI safety workforce has grown substantially, capabilities research is growing 30-40% per year. The ratio of capabilities to safety researchers has remained roughly constant at 10-15:1, meaning the absolute gap continues to widen.

Top Research Categories (by FTEs):

  1. Miscellaneous technical AI safety research
  2. LLM safety
  3. Interpretability

Loading diagram...

This matrix shows how strongly each major intervention addresses each risk category. Ratings are based on current evidence and expert assessments.

InterventionAccident RisksMisuse RisksStructural RisksEpistemic RisksPrimary Mechanism
InterpretabilityHighLowLowDetect deception and misalignment in model internals
AI ControlHighMediumExternal constraints regardless of AI intentions
EvaluationsHighMediumLowPre-deployment testing for dangerous capabilities
RLHF/Constitutional AIMediumMediumTrain models to follow human preferences
Scalable OversightMediumLowHuman supervision of superhuman systems
Compute GovernanceLowHighMediumHardware chokepoints limit access
Export ControlsLowHighMediumRestrict adversary access to training compute
Responsible ScalingMediumMediumLowCapability thresholds trigger safety requirements
International CoordinationLowMediumHighReduce racing dynamics through agreements
AI Safety InstitutesMediumMediumMediumGovernment capacity for evaluation and oversight
Field BuildingMediumLowMediumLowGrow talent pipeline and research capacity
Epistemic SecurityLowLowHighProtect collective truth-finding capacity
Content AuthenticationMediumHighVerify authentic content in synthetic era

Legend: High = primary focus, addresses directly; Medium = secondary impact; Low = indirect or limited; — = minimal relevance


This framework evaluates interventions across the standard Importance-Tractability-Neglectedness (ITN) dimensions, with additional consideration for timeline fit and portfolio complementarity.

InterventionTractabilityImpact PotentialNeglectednessTimeline FitOverall Priority
InterpretabilityMediumHighLowLongHigh
AI ControlHighMedium-HighMediumNearVery High
EvaluationsHighMediumLowNearHigh
Compute GovernanceHighHighLowNearVery High
International CoordinationLowVery HighHighLongHigh
Field BuildingHighMediumMediumOngoingMedium-High
Epistemic ResilienceMediumMediumHighNear-LongMedium-High
Scalable OversightMedium-LowHighMediumLongMedium

Very High Priority:

  • AI Control scores highly because it provides near-term safety benefits (70-85% tractability for human-level systems) regardless of whether alignment succeeds. It represents a practical bridge during the transition period. Redwood Research received $1.2M for control research in 2024.
  • Compute Governance is one of few levers creating physical constraints on AI development. Hardware chokepoints exist, some measures are already implemented (EU AI Act compute thresholds, US export controls), and impact potential is substantial. GovAI produces leading research on compute governance mechanisms.

High Priority:

  • Interpretability is potentially essential if alignment proves difficult (only reliable way to detect sophisticated deception). MIT Technology Review named mechanistic interpretability a 2026 Breakthrough Technology. Anthropic’s attribution graphs revealed hidden reasoning in Claude 3.5 Haiku. FAS recommends federal R&D funding through DARPA and NSF.
  • Evaluations provide measurable near-term impact and are already standard practice at major labs. Open Philanthropy launched an RFP for capability evaluations ($200K-$5M grants). METR partners with Anthropic and OpenAI on frontier model evaluations. NIST invested $20M in AI Economic Security Centers.
  • International Coordination has very high impact potential for addressing structural risks like racing dynamics, but low tractability given current geopolitical tensions. The International AI Safety Report 2025, led by Yoshua Bengio with 100+ authors from 30 countries, represents the largest global collaboration to date.

Medium-High Priority:

  • Field Building and Epistemic Resilience are relatively neglected meta-level interventions that multiply the effectiveness of direct technical and governance work. 80,000 Hours notes good funding opportunities in AI safety exist for qualified researchers.

Each intervention affects different parameters in the AI Transition Model. This mapping helps identify which interventions address which aspects of the transition.

InterventionPrimary ParameterSecondary ParametersMechanism
InterpretabilityInterpretability CoverageAlignment Robustness, Safety-Capability GapDirect visibility into model internals
AI ControlHuman Oversight QualityAlignment RobustnessExternal constraints maintain oversight
EvaluationsSafety-Capability GapSafety Culture Strength, Human Oversight QualityPre-deployment testing identifies risks
Scalable OversightHuman Oversight QualityAlignment RobustnessHuman supervision despite capability gaps
InterventionPrimary ParameterSecondary ParametersMechanism
Compute GovernanceRacing IntensityCoordination Capacity, AI Control ConcentrationHardware chokepoints slow development
Responsible ScalingSafety Culture StrengthSafety-Capability GapCapability thresholds trigger requirements
International CoordinationCoordination CapacityRacing IntensityAgreements reduce competitive pressure
LegislationRegulatory CapacitySafety Culture StrengthBinding requirements with enforcement
InterventionPrimary ParameterSecondary ParametersMechanism
Field BuildingSafety ResearchAlignment ProgressGrow talent pipeline and capacity
Epistemic SecurityEpistemic HealthSocietal Trust, Reality CoherenceProtect collective knowledge
AI Safety InstitutesInstitutional QualityRegulatory CapacityGovernment capacity for oversight

Analysis of the current intervention portfolio reveals several areas where coverage is thin:

Gap AreaCurrent InvestmentRisk ExposureRecommended Action
Epistemic RisksUnder 5% of portfolio ($3-5M/year)Epistemic Collapse, Reality FragmentationIncrease to 8-10% of portfolio; invest in content authentication and epistemic infrastructure
Long-term Structural Risks4-6% of portfolio; international coordination is low tractabilityLock-in, Concentration of PowerDevelop alternative coordination mechanisms; invest in governance research
Post-Incident RecoveryUnder 1% of portfolioAll risk categoriesDevelop recovery protocols and resilience measures; allocate 3-5% of portfolio
Misuse by State ActorsExport controls are primary lever; $5-10M in policy researchAI Authoritarian Tools, AI Mass SurveillanceResearch additional governance mechanisms; increase to $15-25M
Independent Evaluation Capacity70%+ of evals done by labs themselvesConflict of interest, verification gapsOpen Philanthropy’s eval RFP addresses this with $200K-$5M grants

Certain interventions work better together than in isolation:

Technical + Governance:

  • AI Evaluations inform Responsible Scaling Policies (RSPs) thresholds
  • Interpretability enables verification for International Coordination
  • AI Control provides safety margin while governance matures

Near-term + Long-term:

  • Compute Governance buys time for Interpretability research
  • AI Evaluations identify near-term risks while Scalable Oversight develops
  • Field Building and Community ensures capacity for future technical work

Prevention + Resilience:

  • Technical safety research aims to prevent failures
  • Epistemic Security and economic resilience limit damage if prevention fails
  • Both are needed for robust defense-in-depth

The following table estimates 2024 funding levels by intervention area and compares them to recommended allocations based on neglectedness and impact potential. Total external AI safety funding was approximately $110-130 million in 2024, with Coefficient Giving providing ~60% of this amount.

Intervention AreaEst. 2024 Funding% of TotalRecommended ShiftKey Funders
RLHF/Training Methods$15-35M≈25%Decrease to 20%Frontier labs (internal), academic grants
Interpretability$15-25M≈18%MaintainCoefficient Giving, Superalignment Fast Grants ($10M)
Evaluations & Evals Infrastructure$12-18M≈13%Increase to 20%CAIS ($1.5M), UK AISI, labs
AI Control Research$1-12M≈9%Increase to 15%Redwood Research ($1.2M), Anthropic
Compute Governance$1-10M≈7%Increase to 12%Government programs, policy organizations
Field Building & Talent$10-15M≈11%Maintain80,000 Hours, MATS, various fellowships
Governance & Policy$1-12M≈9%Increase to 12%Coefficient Giving policy grants, government initiatives
International Coordination$1-5M≈4%Increase to 8%UK/EU government initiatives (≈$14M total)
Epistemic Resilience$1-4M≈3%Increase to 8%Very few dedicated funders
Funder2024 AllocationFocus AreasSource
Open Philanthropy$63.6MTechnical safety, governance, field building60% of external funding
Jaan Tallinn$20MLong-term alignment researchPersonal foundation
Eric Schmidt (Schmidt Sciences)$10MSafety benchmarking, adversarial evaluationQuick Market Pitch
AI Safety Fund$10MCollaborative research (Anthropic, Google, Microsoft, OpenAI)Frontier Model Forum
Future of Life Institute$5MSmaller grants, fellowshipsDiverse portfolio
Steven Schuurman Foundation€5M/yearVarious AI safety initiativesElastic co-founder
Total External$110-130M2024 estimate

2025 Trajectory: Early data (through July 2025) shows $67M already committed, putting the year on track to exceed 2024 totals by 40-50%.

The funding landscape reveals several structural imbalances:

Gap TypeCurrent StateImpactRecommended Action
Climate vs AI safetyClimate philanthropy: ≈$1-15B; AI safety: ≈$130M≈100x disparity despite comparable catastrophic potentialIncrease AI safety funding to at least $100M-1B annually
Capabilities vs safety≈$100B in AI data center capex (2024) vs ≈$130M safety≈1500:1 ratioRedirect 0.5-1% of capabilities spending to safety
Funder concentrationCoefficient Giving: 60% of external fundingSingle point of failure; limits diversityDiversify funding sources; new initiatives like Humanity AI ($100M)
Talent pipelineOver-optimized for researchersShortage in governance, operations, advocacyExpand non-research talent programs

AreaCurrent AllocationRecommendedRationale
RLHF/TrainingVery HighHighDeployed at scale but limited effectiveness against deceptive alignment
InterpretabilityHighHighRapid progress; potential for fundamental breakthroughs
EvaluationsHighVery HighCritical for identifying dangerous capabilities pre-deployment
AI ControlMediumHighNear-term tractable; provides safety regardless of alignment
Compute GovernanceMediumHighOne of few physical levers; already showing policy impact
International CoordinationLowMediumLow tractability but very high stakes
Epistemic ResilienceVery LowMediumHighly neglected; addresses underserved risk category
Field BuildingMediumMediumMaintain current investment; returns are well-established

The current portfolio shows several structural vulnerabilities:

Concentration TypeCurrent StateRiskMitigation
Funder concentrationCoefficient Giving provides ≈60% of external fundingStrategy changes affect entire fieldCultivate diverse funding sources
Geographic concentrationUS and UK receive majority of fundingLimited global coordination capacitySupport emerging hubs (Berlin, Canada, Australia)
Frontier lab dependenceMost technical safety at Anthropic, OpenAI, DeepMindConflicts of interest; limited independent verificationIncrease funding to MIRI ($1.1M), Redwood, ARC
Research over operationsPipeline over-optimized for researchersShortage of governance, advocacy, operations talentExpand non-research career paths
Technical over governanceTechnical ~60% vs governance ≈15% of fundingGovernance may be more neglected and tractableRebalance toward policy research
Prevention over resilienceMinimal investment in post-incident recoveryNo fallback if prevention failsDevelop recovery protocols

Different beliefs about AI risk lead to different portfolio recommendations:

WorldviewPrioritizeDeprioritize
Alignment is very hardInterpretability, Control, International coordinationRLHF, Voluntary commitments
Misuse is the main riskCompute governance, Content authentication, LegislationInterpretability, Agent foundations
Short timelinesAI Control, Evaluations, Responsible scalingLong-term governance research
Racing dynamics dominateInternational coordination, Compute governanceUnilateral safety research
Epistemic collapse is likelyEpistemic security, Content authenticationTechnical alignment

A robust portfolio should satisfy the following criteria, which can help evaluate current gaps and guide future allocation:

Robustness CriterionCurrent StatusGap AssessmentTarget
Cover multiple failure modesAccident risks: 60% coverage; Misuse: 50%; Structural: 30%; Epistemic: under 15%Medium gap70%+ coverage across all categories
Prevention and resilience~95% prevention, ≈5% resilienceLarge gap80% prevention, 20% resilience
Near-term and long-term balance55% near-term (evals, control), 45% long-term (interpretability, governance)Small gapMaintain current balance
Independent research capacityFrontier labs: 70%+ of technical safety; Independents: under 30%Medium gap50/50 split between labs and independents
Support multiple worldviewsMost interventions robust across scenariosSmall gapMaintain
Geographic diversityUS/UK: 80%+ of funding; EU: 10%; ROW: under 10%Medium gapUS/UK: 60%, EU: 20%, ROW: 20%
Funder diversity5 funders provide 85% of external funding; Open Philanthropy alone provides 60%Large gapNo single funder greater than 25%

SourceTypeRelevance
Coefficient Giving Progress 2024Funder ReportPrimary data on AI safety funding levels and priorities
AI Safety Funding Situation OverviewAnalysisComprehensive breakdown of funding sources and gaps
AI Safety Needs More FundersPolicy BriefComparison to other catastrophic risk funding
AI Safety Field Growth Analysis 2025ResearchField growth metrics, 1,100 FTEs, 21% annual growth
International AI Safety Report 2025Global Report100+ authors, 30 countries, Yoshua Bengio lead
Future of Life AI Safety Index 2025Industry Assessment33 indicators across 6 domains for 7 leading companies
Open Philanthropy Technical AI Safety RFPGrant Program$40M allocation for technical safety research
Open Philanthropy Capability Evaluations RFPGrant Program$200K-$5M grants for evaluation infrastructure
America’s AI Action Plan (July 2025)PolicyUS government AI priorities including evaluations ecosystem
Accelerating AI Interpretability (FAS)Policy BriefFederal funding recommendations for interpretability
80,000 Hours: AI RiskCareer GuidanceIntervention prioritization and neglectedness analysis
RLHF Limitations PaperResearchEvidence on limitations of current alignment methods
Carnegie AI Safety as Global Public GoodPolicy AnalysisInternational coordination challenges and research priorities
ITU Annual AI Governance Report 2025Global ReportAI governance landscape across nations

  • Responses Overview - Full list of interventions
  • Technical Approaches - Alignment, interpretability, control
  • Governance Approaches - Legislation, compute governance, international
  • Risks Overview - Risk categories addressed by interventions
  • AI Transition Model - Framework for understanding AI transition dynamics

The intervention portfolio collectively affects the Ai Transition Model across all major factors:

FactorKey InterventionsCoverage
Misalignment PotentialAlignment research, interpretability, controlTechnical safety
Civilizational CompetenceGovernance, institutions, epistemic toolsCoordination capacity
Transition TurbulenceCompute governance, international coordinationRacing dynamics
Misuse PotentialResilience, authentication, detectionHarm reduction

Portfolio balance matters: over-investment in any single intervention type creates vulnerability if that approach fails.