Intervention Portfolio
- Links38 links could use <R> components
Quick Assessment
Section titled “Quick Assessment”| Dimension | Rating | Evidence |
|---|---|---|
| Tractability | Medium-High | Varies widely: evaluations (high), compute governance (high), international coordination (low). Open Philanthropy’s 2025 RFP allocated $40M for technical safety research. |
| Scalability | High | Portfolio approach scales across 4 risk categories and multiple timelines. AI Safety Field Growth Analysis shows 21% annual FTE growth rate. |
| Current Maturity | Medium | Core interventions established; significant gaps in epistemic resilience (less than 5% of portfolio) and post-incident recovery (under 1%). |
| Research Workforce | ≈1,100 FTEs | 600 technical + 500 non-technical AI safety FTEs in 2025, up from 400 total in 2022 (AI Safety Field Growth Analysis). |
| Time Horizon | Near-Long | Near-term (evaluations, control) complement long-term work (interpretability, governance). International AI Safety Report 2025 emphasizes urgency. |
| Funding Level | $110-130M/year external | 2024 external funding. Early 2025 shows 40-50% acceleration with $67M committed through July. Internal lab spending adds $500-550M for ≈$650M total (Coefficient Giving analysis). |
| Funding Concentration | 85% from 5 sources | Open Philanthropy: $63.6M (60%); Jaan Tallinn: $20M; Eric Schmidt: $10M; AI Safety Fund: $10M; FLI: $5M |
| Safety/Capabilities Ratio | ≈0.5-1.3% | $600-650M safety vs $50B+ capabilities spending. FAS recommends 30% of compute for safety research. |
Overview
Section titled “Overview”This page provides a strategic view of the AI safety intervention landscape, analyzing how different interventions address different risk categories and improve key parameters in the AI Transition Model. Rather than examining interventions individually, this portfolio view helps identify coverage gaps, complementarities, and allocation priorities.
The intervention landscape can be divided into several categories: technical approaches (alignment, interpretability, control), governance mechanisms (legislation, compute governance, international coordination), field building (talent, funding, community), and resilience measures (epistemic security, economic adaptation). Each category has different tractability profiles, timelines, and risk coverage—understanding these tradeoffs is essential for strategic resource allocation.
An effective safety portfolio requires both breadth (covering diverse failure modes) and depth (sufficient investment in each area to achieve impact). The current portfolio shows significant concentration in certain areas (RLHF, capability evaluations) while other areas remain relatively neglected (epistemic resilience, international coordination).
Field Growth Trajectory
Section titled “Field Growth Trajectory”| Metric | 2022 | 2025 | Growth Rate | Notes |
|---|---|---|---|---|
| Technical AI Safety FTEs | 300 | 600 | 21%/year | AI Safety Field Growth Analysis 2025 |
| Non-Technical AI Safety FTEs | 100 | 500 | 71%/year | Governance, policy, operations |
| Total AI Safety FTEs | 400 | 1,100 | 40%/year | Field-wide compound growth |
| AI Safety Organizations | ≈50 | ≈120 | 24%/year | Exponential growth since 2020 |
| Capabilities FTEs (comparison) | ≈3,000 | ≈15,000 | 30-40%/year | OpenAI alone: 300 → 3,000 |
Critical Comparison: While AI safety workforce has grown substantially, capabilities research is growing 30-40% per year. The ratio of capabilities to safety researchers has remained roughly constant at 10-15:1, meaning the absolute gap continues to widen.
Top Research Categories (by FTEs):
- Miscellaneous technical AI safety research
- LLM safety
- Interpretability
Intervention Categories and Risk Coverage
Section titled “Intervention Categories and Risk Coverage”Intervention by Risk Matrix
Section titled “Intervention by Risk Matrix”This matrix shows how strongly each major intervention addresses each risk category. Ratings are based on current evidence and expert assessments.
| Intervention | Accident Risks | Misuse Risks | Structural Risks | Epistemic Risks | Primary Mechanism |
|---|---|---|---|---|---|
| InterpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100 | High | Low | Low | — | Detect deception and misalignment in model internals |
| AI ControlSafety AgendaAI ControlAI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, offering 40-60% catastrophic risk reduction if align...Quality: 75/100 | High | Medium | — | — | External constraints regardless of AI intentions |
| EvaluationsSafety AgendaAI EvaluationsEvaluations and red-teaming reduce detectable dangerous capabilities by 30-50x when combined with training interventions (o3 covert actions: 13% → 0.4%), but face fundamental limitations against so...Quality: 72/100 | High | Medium | Low | — | Pre-deployment testing for dangerous capabilities |
| RLHF/Constitutional AICapabilityRLHFRLHF/Constitutional AI achieves 82-85% preference improvements and 40.8% adversarial attack reduction for current systems, but faces fundamental scalability limits: weak-to-strong supervision shows...Quality: 63/100 | Medium | Medium | — | — | Train models to follow human preferences |
| Scalable OversightSafety AgendaScalable OversightProcess supervision achieves 78.2% accuracy on MATH benchmarks (vs 72.4% outcome-based) and is deployed in OpenAI's o1 models, while debate shows 60-80% accuracy on factual questions with +4% impro...Quality: 68/100 | Medium | Low | — | — | Human supervision of superhuman systems |
| Compute Governance | Low | High | Medium | — | Hardware chokepoints limit access |
| Export ControlsPolicyUS AI Chip Export ControlsComprehensive empirical analysis finds US chip export controls provide 1-3 year delays on Chinese AI development but face severe enforcement gaps (140,000 GPUs smuggled in 2024, only 1 BIS officer ...Quality: 73/100 | Low | High | Medium | — | Restrict adversary access to training compute |
| Responsible ScalingPolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 | Medium | Medium | Low | — | Capability thresholds trigger safety requirements |
| International Coordination | Low | Medium | High | — | Reduce racing dynamics through agreements |
| AI Safety InstitutesPolicyAI Safety Institutes (AISIs)Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-deployment access to frontier models, but face critic...Quality: 69/100 | Medium | Medium | Medium | — | Government capacity for evaluation and oversight |
| Field Building | Medium | Low | Medium | Low | Grow talent pipeline and research capacity |
| Epistemic SecurityInterventionEpistemic SecurityComprehensive analysis of epistemic security finds human deepfake detection at near-chance levels (55.5%), AI detection dropping 45-50% on novel content, but content authentication (C2PA) market gr...Quality: 63/100 | — | Low | Low | High | Protect collective truth-finding capacity |
| Content AuthenticationInterventionContent AuthenticationContent authentication via C2PA and watermarking (10B+ images) offers superior robustness to failing detection methods (55% accuracy), with EU AI Act mandates by August 2026 driving adoption among ...Quality: 58/100 | — | Medium | — | High | Verify authentic content in synthetic era |
Legend: High = primary focus, addresses directly; Medium = secondary impact; Low = indirect or limited; — = minimal relevance
Prioritization Framework
Section titled “Prioritization Framework”This framework evaluates interventions across the standard Importance-Tractability-Neglectedness (ITN) dimensions, with additional consideration for timeline fit and portfolio complementarity.
| Intervention | Tractability | Impact Potential | Neglectedness | Timeline Fit | Overall Priority |
|---|---|---|---|---|---|
| Interpretability | Medium | High | Low | Long | High |
| AI Control | High | Medium-High | Medium | Near | Very High |
| Evaluations | High | Medium | Low | Near | High |
| Compute Governance | High | High | Low | Near | Very High |
| International Coordination | Low | Very High | High | Long | High |
| Field Building | High | Medium | Medium | Ongoing | Medium-High |
| Epistemic Resilience | Medium | Medium | High | Near-Long | Medium-High |
| Scalable Oversight | Medium-Low | High | Medium | Long | Medium |
Prioritization Rationale
Section titled “Prioritization Rationale”Very High Priority:
- AI Control scores highly because it provides near-term safety benefits (70-85% tractability for human-level systems) regardless of whether alignment succeeds. It represents a practical bridge during the transition period. Redwood Research received $1.2M for control research in 2024.
- Compute Governance is one of few levers creating physical constraints on AI development. Hardware chokepoints exist, some measures are already implemented (EU AI Act compute thresholds, US export controls), and impact potential is substantial. GovAI produces leading research on compute governance mechanisms.
High Priority:
- Interpretability is potentially essential if alignment proves difficult (only reliable way to detect sophisticated deception). MIT Technology Review named mechanistic interpretability a 2026 Breakthrough Technology. Anthropic’s attribution graphs revealed hidden reasoning in Claude 3.5 Haiku. FAS recommends federal R&D funding through DARPA and NSF.
- Evaluations provide measurable near-term impact and are already standard practice at major labs. Open Philanthropy launched an RFP for capability evaluations ($200K-$5M grants). METR partners with Anthropic and OpenAI on frontier model evaluations. NIST invested $20M in AI Economic Security Centers.
- International Coordination has very high impact potential for addressing structural risks like racing dynamics, but low tractability given current geopolitical tensions. The International AI Safety Report 2025, led by Yoshua Bengio with 100+ authors from 30 countries, represents the largest global collaboration to date.
Medium-High Priority:
- Field Building and Epistemic Resilience are relatively neglected meta-level interventions that multiply the effectiveness of direct technical and governance work. 80,000 Hours notes good funding opportunities in AI safety exist for qualified researchers.
AI Transition Model Integration
Section titled “AI Transition Model Integration”Each intervention affects different parameters in the AI Transition Model. This mapping helps identify which interventions address which aspects of the transition.
Technical Approaches
Section titled “Technical Approaches”| Intervention | Primary Parameter | Secondary Parameters | Mechanism |
|---|---|---|---|
| InterpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100 | Interpretability CoverageAi Transition Model ParameterInterpretability CoverageThis page contains only a React component import with no actual content displayed. Cannot assess interpretability coverage methodology or findings without rendered content. | Alignment RobustnessAi Transition Model ParameterAlignment RobustnessThis page contains only a React component import with no actual content rendered in the provided text. Cannot assess importance or quality without the actual substantive content., Safety-Capability GapAi Transition Model ParameterSafety-Capability GapThis page contains no actual content - only a React component reference that dynamically loads content from elsewhere in the system. Cannot evaluate substance, methodology, or conclusions without t... | Direct visibility into model internals |
| AI ControlSafety AgendaAI ControlAI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, offering 40-60% catastrophic risk reduction if align...Quality: 75/100 | Human Oversight QualityAi Transition Model ParameterHuman Oversight QualityThis page contains only a React component placeholder with no actual content rendered. Cannot assess substance, methodology, or conclusions. | Alignment RobustnessAi Transition Model ParameterAlignment RobustnessThis page contains only a React component import with no actual content rendered in the provided text. Cannot assess importance or quality without the actual substantive content. | External constraints maintain oversight |
| EvaluationsSafety AgendaAI EvaluationsEvaluations and red-teaming reduce detectable dangerous capabilities by 30-50x when combined with training interventions (o3 covert actions: 13% → 0.4%), but face fundamental limitations against so...Quality: 72/100 | Safety-Capability GapAi Transition Model ParameterSafety-Capability GapThis page contains no actual content - only a React component reference that dynamically loads content from elsewhere in the system. Cannot evaluate substance, methodology, or conclusions without t... | Safety Culture StrengthAi Transition Model ParameterSafety Culture StrengthThis page contains only a React component import with no actual content displayed. Cannot assess the substantive content about safety culture strength in AI development., Human Oversight QualityAi Transition Model ParameterHuman Oversight QualityThis page contains only a React component placeholder with no actual content rendered. Cannot assess substance, methodology, or conclusions. | Pre-deployment testing identifies risks |
| Scalable OversightSafety AgendaScalable OversightProcess supervision achieves 78.2% accuracy on MATH benchmarks (vs 72.4% outcome-based) and is deployed in OpenAI's o1 models, while debate shows 60-80% accuracy on factual questions with +4% impro...Quality: 68/100 | Human Oversight QualityAi Transition Model ParameterHuman Oversight QualityThis page contains only a React component placeholder with no actual content rendered. Cannot assess substance, methodology, or conclusions. | Alignment RobustnessAi Transition Model ParameterAlignment RobustnessThis page contains only a React component import with no actual content rendered in the provided text. Cannot assess importance or quality without the actual substantive content. | Human supervision despite capability gaps |
Governance Approaches
Section titled “Governance Approaches”| Intervention | Primary Parameter | Secondary Parameters | Mechanism |
|---|---|---|---|
| Compute Governance | Racing IntensityAi Transition Model ParameterRacing IntensityThis page contains only React component imports with no actual content about racing intensity or transition turbulence factors. It appears to be a placeholder or template awaiting content population. | Coordination CapacityAi Transition Model ParameterCoordination CapacityThis page contains only a React component reference with no actual content rendered in the provided text. Unable to evaluate coordination capacity analysis without the component's output., AI Control ConcentrationAi Transition Model ParameterAI Control ConcentrationThis page contains only a React component placeholder with no actual content loaded. Cannot evaluate substance, methodology, or conclusions. | Hardware chokepoints slow development |
| Responsible ScalingPolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 | Safety Culture StrengthAi Transition Model ParameterSafety Culture StrengthThis page contains only a React component import with no actual content displayed. Cannot assess the substantive content about safety culture strength in AI development. | Safety-Capability GapAi Transition Model ParameterSafety-Capability GapThis page contains no actual content - only a React component reference that dynamically loads content from elsewhere in the system. Cannot evaluate substance, methodology, or conclusions without t... | Capability thresholds trigger requirements |
| International Coordination | Coordination CapacityAi Transition Model ParameterCoordination CapacityThis page contains only a React component reference with no actual content rendered in the provided text. Unable to evaluate coordination capacity analysis without the component's output. | Racing IntensityAi Transition Model ParameterRacing IntensityThis page contains only React component imports with no actual content about racing intensity or transition turbulence factors. It appears to be a placeholder or template awaiting content population. | Agreements reduce competitive pressure |
| Legislation | Regulatory CapacityAi Transition Model ParameterRegulatory CapacityEmpty page with only a component reference - no actual content to evaluate. | Safety Culture StrengthAi Transition Model ParameterSafety Culture StrengthThis page contains only a React component import with no actual content displayed. Cannot assess the substantive content about safety culture strength in AI development. | Binding requirements with enforcement |
Meta-Level Interventions
Section titled “Meta-Level Interventions”| Intervention | Primary Parameter | Secondary Parameters | Mechanism |
|---|---|---|---|
| Field Building | Safety ResearchAi Transition Model MetricSafety ResearchComprehensive analysis of AI safety research capacity shows ~1,100 FTE researchers globally (600 technical, 500 governance) with $150-400M annual funding, representing severe under-resourcing (1:10...Quality: 62/100 | Alignment ProgressAi Transition Model MetricAlignment ProgressComprehensive empirical tracking of AI alignment progress across 10 dimensions finds highly uneven progress: dramatic improvements in jailbreak resistance (87%→3% ASR for frontier models) but conce...Quality: 66/100 | Grow talent pipeline and capacity |
| Epistemic SecurityInterventionEpistemic SecurityComprehensive analysis of epistemic security finds human deepfake detection at near-chance levels (55.5%), AI detection dropping 45-50% on novel content, but content authentication (C2PA) market gr...Quality: 63/100 | Epistemic HealthAi Transition Model ParameterEpistemic HealthThis page contains only a component placeholder with no actual content. Cannot be evaluated for AI prioritization relevance. | Societal TrustAi Transition Model ParameterSocietal TrustThis page contains only a React component placeholder with no actual content rendered. No information about societal trust as a factor in AI transition is present., Reality CoherenceAi Transition Model ParameterReality CoherenceThis page contains only a React component call with no actual content visible for evaluation. Unable to assess any substantive material about reality coherence or its role in AI transition models. | Protect collective knowledge |
| AI Safety InstitutesPolicyAI Safety Institutes (AISIs)Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-deployment access to frontier models, but face critic...Quality: 69/100 | Institutional QualityAi Transition Model ParameterInstitutional QualityThis page contains only a React component import with no actual content rendered. It cannot be evaluated for substance, methodology, or conclusions. | Regulatory CapacityAi Transition Model ParameterRegulatory CapacityEmpty page with only a component reference - no actual content to evaluate. | Government capacity for oversight |
Portfolio Gaps and Complementarities
Section titled “Portfolio Gaps and Complementarities”Coverage Gaps
Section titled “Coverage Gaps”Analysis of the current intervention portfolio reveals several areas where coverage is thin:
| Gap Area | Current Investment | Risk Exposure | Recommended Action |
|---|---|---|---|
| Epistemic Risks | Under 5% of portfolio ($3-5M/year) | Epistemic CollapseRiskEpistemic CollapseEpistemic collapse describes the complete erosion of society's ability to establish factual consensus when AI-generated synthetic content overwhelms verification capacity. Current AI detectors achi...Quality: 49/100, Reality FragmentationRiskReality FragmentationReality fragmentation describes the breakdown of shared epistemological foundations where populations hold incompatible beliefs about basic facts (e.g., 73% Republicans vs 23% Democrats believe 202...Quality: 28/100 | Increase to 8-10% of portfolio; invest in content authentication and epistemic infrastructure |
| Long-term Structural Risks | 4-6% of portfolio; international coordination is low tractability | Lock-inRiskLock-inComprehensive analysis of AI lock-in scenarios where values, systems, or power structures become permanently entrenched. Documents evidence including Big Tech's 66-70% cloud control, AI surveillanc...Quality: 64/100, Concentration of PowerRiskConcentration of PowerDocuments how AI development is concentrating in ~20 organizations due to $100M+ compute costs, with 5 firms controlling 80%+ of cloud infrastructure and projections reaching $1-10B per model by 20...Quality: 65/100 | Develop alternative coordination mechanisms; invest in governance research |
| Post-Incident Recovery | Under 1% of portfolio | All risk categories | Develop recovery protocols and resilience measures; allocate 3-5% of portfolio |
| Misuse by State Actors | Export controls are primary lever; $5-10M in policy research | AI Authoritarian ToolsRiskAI Authoritarian ToolsComprehensive analysis documenting AI-enabled authoritarian tools across surveillance (350M+ cameras in China analyzing 25.9M faces daily per district), censorship (22+ countries mandating AI conte...Quality: 91/100, AI Mass SurveillanceRiskAI Mass SurveillanceComprehensive analysis of AI-enabled mass surveillance documenting deployment in 97 of 179 countries, with detailed evidence of China's 600M cameras and Xinjiang detention of 1-1.8M Uyghurs. NIST s...Quality: 64/100 | Research additional governance mechanisms; increase to $15-25M |
| Independent Evaluation Capacity | 70%+ of evals done by labs themselves | Conflict of interest, verification gaps | Open Philanthropy’s eval RFP addresses this with $200K-$5M grants |
Key Complementarities
Section titled “Key Complementarities”Certain interventions work better together than in isolation:
Technical + Governance:
- AI EvaluationsSafety AgendaAI EvaluationsEvaluations and red-teaming reduce detectable dangerous capabilities by 30-50x when combined with training interventions (o3 covert actions: 13% → 0.4%), but face fundamental limitations against so...Quality: 72/100 inform Responsible Scaling Policies (RSPs)PolicyResponsible Scaling Policies (RSPs)RSPs are voluntary industry frameworks that trigger safety evaluations at capability thresholds, currently covering 60-70% of frontier development across 3-4 major labs. Estimated 10-25% risk reduc...Quality: 64/100 thresholds
- InterpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100 enables verification for International CoordinationAi Transition Model ParameterInternational CoordinationThis page contains only a React component placeholder with no actual content rendered. Cannot assess importance or quality without substantive text.
- AI ControlSafety AgendaAI ControlAI Control is a defensive safety approach that maintains control over potentially misaligned AI through monitoring, containment, and redundancy, offering 40-60% catastrophic risk reduction if align...Quality: 75/100 provides safety margin while governance matures
Near-term + Long-term:
- Compute GovernancePolicyCompute GovernanceThis is a comprehensive overview of U.S. AI chip export controls policy, documenting the evolution from blanket restrictions to case-by-case licensing while highlighting significant enforcement cha...Quality: 58/100 buys time for InterpretabilitySafety AgendaInterpretabilityMechanistic interpretability has extracted 34M+ interpretable features from Claude 3 Sonnet with 90% automated labeling accuracy and demonstrated 75-85% success in causal validation, though less th...Quality: 66/100 research
- AI EvaluationsSafety AgendaAI EvaluationsEvaluations and red-teaming reduce detectable dangerous capabilities by 30-50x when combined with training interventions (o3 covert actions: 13% → 0.4%), but face fundamental limitations against so...Quality: 72/100 identify near-term risks while Scalable OversightSafety AgendaScalable OversightProcess supervision achieves 78.2% accuracy on MATH benchmarks (vs 72.4% outcome-based) and is deployed in OpenAI's o1 models, while debate shows 60-80% accuracy on factual questions with +4% impro...Quality: 68/100 develops
- Field Building and CommunityCruxField Building and CommunityField-building focuses on growing the AI safety ecosystem rather than doing direct research or policy work. The theory is that by increasing the number and quality of people working on AI safety, w... ensures capacity for future technical work
Prevention + Resilience:
- Technical safety research aims to prevent failures
- Epistemic SecurityInterventionEpistemic SecurityComprehensive analysis of epistemic security finds human deepfake detection at near-chance levels (55.5%), AI detection dropping 45-50% on novel content, but content authentication (C2PA) market gr...Quality: 63/100 and economic resilience limit damage if prevention fails
- Both are needed for robust defense-in-depth
Portfolio Funding Allocation
Section titled “Portfolio Funding Allocation”The following table estimates 2024 funding levels by intervention area and compares them to recommended allocations based on neglectedness and impact potential. Total external AI safety funding was approximately $110-130 million in 2024, with Coefficient Giving providing ~60% of this amount.
| Intervention Area | Est. 2024 Funding | % of Total | Recommended Shift | Key Funders |
|---|---|---|---|---|
| RLHF/Training Methods | $15-35M | ≈25% | Decrease to 20% | Frontier labs (internal), academic grants |
| Interpretability | $15-25M | ≈18% | Maintain | Coefficient GivingCoefficient GivingCoefficient Giving (formerly Open Philanthropy) has directed $4B+ in grants since 2014, including $336M to AI safety (~60% of external funding). The organization spent ~$50M on AI safety in 2024, w...Quality: 55/100, Superalignment Fast Grants ($10M) |
| Evaluations & Evals Infrastructure | $12-18M | ≈13% | Increase to 20% | CAIS ($1.5M), UK AISI, labs |
| AI Control Research | $1-12M | ≈9% | Increase to 15% | Redwood Research ($1.2M), Anthropic |
| Compute Governance | $1-10M | ≈7% | Increase to 12% | Government programs, policy organizations |
| Field Building & Talent | $10-15M | ≈11% | Maintain | 80,000 Hours, MATS, various fellowships |
| Governance & Policy | $1-12M | ≈9% | Increase to 12% | Coefficient GivingCoefficient GivingCoefficient Giving (formerly Open Philanthropy) has directed $4B+ in grants since 2014, including $336M to AI safety (~60% of external funding). The organization spent ~$50M on AI safety in 2024, w...Quality: 55/100 policy grants, government initiatives |
| International Coordination | $1-5M | ≈4% | Increase to 8% | UK/EU government initiatives (≈$14M total) |
| Epistemic Resilience | $1-4M | ≈3% | Increase to 8% | Very few dedicated funders |
2025 Funding Landscape Update
Section titled “2025 Funding Landscape Update”| Funder | 2024 Allocation | Focus Areas | Source |
|---|---|---|---|
| Open Philanthropy | $63.6M | Technical safety, governance, field building | 60% of external funding |
| Jaan Tallinn | $20M | Long-term alignment research | Personal foundation |
| Eric Schmidt (Schmidt Sciences) | $10M | Safety benchmarking, adversarial evaluation | Quick Market Pitch |
| AI Safety Fund | $10M | Collaborative research (Anthropic, Google, Microsoft, OpenAI) | Frontier Model Forum |
| Future of Life Institute | $5M | Smaller grants, fellowships | Diverse portfolio |
| Steven Schuurman Foundation | €5M/year | Various AI safety initiatives | Elastic co-founder |
| Total External | $110-130M | — | 2024 estimate |
2025 Trajectory: Early data (through July 2025) shows $67M already committed, putting the year on track to exceed 2024 totals by 40-50%.
Funding Gap Analysis
Section titled “Funding Gap Analysis”The funding landscape reveals several structural imbalances:
| Gap Type | Current State | Impact | Recommended Action |
|---|---|---|---|
| Climate vs AI safety | Climate philanthropy: ≈$1-15B; AI safety: ≈$130M | ≈100x disparity despite comparable catastrophic potential | Increase AI safety funding to at least $100M-1B annually |
| Capabilities vs safety | ≈$100B in AI data center capex (2024) vs ≈$130M safety | ≈1500:1 ratio | Redirect 0.5-1% of capabilities spending to safety |
| Funder concentration | Coefficient GivingCoefficient GivingCoefficient Giving (formerly Open Philanthropy) has directed $4B+ in grants since 2014, including $336M to AI safety (~60% of external funding). The organization spent ~$50M on AI safety in 2024, w...Quality: 55/100: 60% of external funding | Single point of failure; limits diversity | Diversify funding sources; new initiatives like Humanity AI ($100M) |
| Talent pipeline | Over-optimized for researchers | Shortage in governance, operations, advocacy | Expand non-research talent programs |
Resource Allocation Assessment
Section titled “Resource Allocation Assessment”Current vs. Recommended Allocation
Section titled “Current vs. Recommended Allocation”| Area | Current Allocation | Recommended | Rationale |
|---|---|---|---|
| RLHF/Training | Very High | High | Deployed at scale but limited effectiveness against deceptive alignment |
| Interpretability | High | High | Rapid progress; potential for fundamental breakthroughs |
| Evaluations | High | Very High | Critical for identifying dangerous capabilities pre-deployment |
| AI Control | Medium | High | Near-term tractable; provides safety regardless of alignment |
| Compute Governance | Medium | High | One of few physical levers; already showing policy impact |
| International Coordination | Low | Medium | Low tractability but very high stakes |
| Epistemic Resilience | Very Low | Medium | Highly neglected; addresses underserved risk category |
| Field Building | Medium | Medium | Maintain current investment; returns are well-established |
Investment Concentration Risks
Section titled “Investment Concentration Risks”The current portfolio shows several structural vulnerabilities:
| Concentration Type | Current State | Risk | Mitigation |
|---|---|---|---|
| Funder concentration | Coefficient Giving provides ≈60% of external funding | Strategy changes affect entire field | Cultivate diverse funding sources |
| Geographic concentration | US and UK receive majority of funding | Limited global coordination capacity | Support emerging hubs (Berlin, Canada, Australia) |
| Frontier lab dependence | Most technical safety at Anthropic, OpenAI, DeepMind | Conflicts of interest; limited independent verification | Increase funding to MIRI ($1.1M), Redwood, ARC |
| Research over operations | Pipeline over-optimized for researchers | Shortage of governance, advocacy, operations talent | Expand non-research career paths |
| Technical over governance | Technical ~60% vs governance ≈15% of funding | Governance may be more neglected and tractable | Rebalance toward policy research |
| Prevention over resilience | Minimal investment in post-incident recovery | No fallback if prevention fails | Develop recovery protocols |
Strategic Considerations
Section titled “Strategic Considerations”Worldview Dependencies
Section titled “Worldview Dependencies”Different beliefs about AI risk lead to different portfolio recommendations:
| Worldview | Prioritize | Deprioritize |
|---|---|---|
| Alignment is very hard | Interpretability, Control, International coordination | RLHF, Voluntary commitments |
| Misuse is the main risk | Compute governance, Content authentication, Legislation | Interpretability, Agent foundations |
| Short timelines | AI Control, Evaluations, Responsible scaling | Long-term governance research |
| Racing dynamics dominate | International coordination, Compute governance | Unilateral safety research |
| Epistemic collapse is likely | Epistemic security, Content authentication | Technical alignment |
Portfolio Robustness
Section titled “Portfolio Robustness”A robust portfolio should satisfy the following criteria, which can help evaluate current gaps and guide future allocation:
| Robustness Criterion | Current Status | Gap Assessment | Target |
|---|---|---|---|
| Cover multiple failure modes | Accident risks: 60% coverage; Misuse: 50%; Structural: 30%; Epistemic: under 15% | Medium gap | 70%+ coverage across all categories |
| Prevention and resilience | ~95% prevention, ≈5% resilience | Large gap | 80% prevention, 20% resilience |
| Near-term and long-term balance | 55% near-term (evals, control), 45% long-term (interpretability, governance) | Small gap | Maintain current balance |
| Independent research capacity | Frontier labs: 70%+ of technical safety; Independents: under 30% | Medium gap | 50/50 split between labs and independents |
| Support multiple worldviews | Most interventions robust across scenarios | Small gap | Maintain |
| Geographic diversity | US/UK: 80%+ of funding; EU: 10%; ROW: under 10% | Medium gap | US/UK: 60%, EU: 20%, ROW: 20% |
| Funder diversity | 5 funders provide 85% of external funding; Open Philanthropy alone provides 60% | Large gap | No single funder greater than 25% |
Key Sources
Section titled “Key Sources”| Source | Type | Relevance |
|---|---|---|
| Coefficient Giving Progress 2024 | Funder Report | Primary data on AI safety funding levels and priorities |
| AI Safety Funding Situation Overview | Analysis | Comprehensive breakdown of funding sources and gaps |
| AI Safety Needs More Funders | Policy Brief | Comparison to other catastrophic risk funding |
| AI Safety Field Growth Analysis 2025 | Research | Field growth metrics, 1,100 FTEs, 21% annual growth |
| International AI Safety Report 2025 | Global Report | 100+ authors, 30 countries, Yoshua Bengio lead |
| Future of Life AI Safety Index 2025 | Industry Assessment | 33 indicators across 6 domains for 7 leading companies |
| Open Philanthropy Technical AI Safety RFP | Grant Program | $40M allocation for technical safety research |
| Open Philanthropy Capability Evaluations RFP | Grant Program | $200K-$5M grants for evaluation infrastructure |
| America’s AI Action Plan (July 2025) | Policy | US government AI priorities including evaluations ecosystem |
| Accelerating AI Interpretability (FAS) | Policy Brief | Federal funding recommendations for interpretability |
| 80,000 Hours: AI Risk | Career Guidance | Intervention prioritization and neglectedness analysis |
| RLHF Limitations Paper | Research | Evidence on limitations of current alignment methods |
| Carnegie AI Safety as Global Public Good | Policy Analysis | International coordination challenges and research priorities |
| ITU Annual AI Governance Report 2025 | Global Report | AI governance landscape across nations |
Related Pages
Section titled “Related Pages”- Responses Overview - Full list of interventions
- Technical Approaches - Alignment, interpretability, control
- Governance Approaches - Legislation, compute governance, international
- Risks Overview - Risk categories addressed by interventions
- AI Transition Model - Framework for understanding AI transition dynamics
AI Transition Model Context
Section titled “AI Transition Model Context”The intervention portfolio collectively affects the Ai Transition Model across all major factors:
| Factor | Key Interventions | Coverage |
|---|---|---|
| Misalignment PotentialAi Transition Model FactorMisalignment PotentialThe aggregate risk that AI systems pursue goals misaligned with human values—combining technical alignment challenges, interpretability gaps, and oversight limitations. | Alignment research, interpretability, control | Technical safety |
| Civilizational CompetenceAi Transition Model FactorCivilizational CompetenceSociety's aggregate capacity to navigate AI transition well—including governance effectiveness, epistemic health, coordination capacity, and adaptive resilience. | Governance, institutions, epistemic tools | Coordination capacity |
| Transition TurbulenceAi Transition Model FactorTransition TurbulenceThe severity of disruption during the AI transition period—economic displacement, social instability, and institutional stress. Distinct from long-term outcomes. | Compute governance, international coordination | Racing dynamics |
| Misuse PotentialAi Transition Model FactorMisuse PotentialThe aggregate risk from deliberate harmful use of AI—including biological weapons, cyber attacks, autonomous weapons, and surveillance misuse. | Resilience, authentication, detection | Harm reduction |
Portfolio balance matters: over-investment in any single intervention type creates vulnerability if that approach fails.