AI Safety Intervention Portfolio
AI Safety Intervention Portfolio
Provides a strategic framework for AI safety resource allocation by mapping 13+ interventions against 4 risk categories, evaluating each on ITN dimensions, and identifying portfolio gaps (epistemic resilience severely neglected, technical work over-concentrated in frontier labs). Total field investment ~\$650M annually with 1,100 FTEs (21% annual growth), but 85% of external funding from 5 sources and safety/capabilities ratio at only 0.5-1.3%. Recommends rebalancing from very high RLHF investment toward evaluations (very high priority), AI control and compute governance (both high priority), with epistemic resilience increasing from very low to medium allocation.
Quick Assessment
| Dimension | Assessment | Evidence |
|---|---|---|
| Tractability | Medium-High | Varies widely: evaluations (high), compute governance (high), international coordination (low). Coefficient Giving's 2025 RFP allocated $40M for technical safety research. |
| Scalability | High | Portfolio approach scales across 4 risk categories and multiple timelines. AI Safety Field Growth Analysis shows 21% annual FTE growth rate. |
| Current Maturity | Medium | Core interventions established; significant gaps in epistemic resilience (less than 5% of portfolio) and post-incident recovery (under 1%). |
| Research Workforce | ≈1,100 FTEs | 600 technical + 500 non-technical AI safety FTEs in 2025, up from 400 total in 2022 (AI Safety Field Growth Analysis). |
| Time Horizon | Near-Long | Near-term (evaluations, control) complement long-term work (interpretability, governance). International AI Safety Report 2025 emphasizes urgency. |
| Funding Level | $110-130M/year external | 2024 external funding. Early 2025 shows 40-50% acceleration with $67M committed through July. Internal lab spending adds $500-550M for ≈$650M total (Coefficient Giving analysis). |
| Funding Concentration | 85% from 5 sources | Coefficient Giving: $63.6M (60%); Jaan Tallinn: $20M; Eric Schmidt: $10M; AI Safety Fund: $10M; FLI: $5M |
| Safety/Capabilities Ratio | ≈0.5-1.3% | $600-650M safety vs $50B+ capabilities spending. FAS recommends 30% of compute for safety research. |
Key Links
| Source | Link |
|---|---|
| Official Website | mop.wiki |
| Wikipedia | en.wikipedia.org |
Overview
This page provides a strategic view of the AI safety intervention landscape, analyzing how different interventions address different risk categories. Rather than examining interventions individually, this portfolio view helps identify coverage gaps, complementarities, and allocation priorities.
The intervention landscape can be divided into several categories: technical approaches (alignment, interpretability, control), governance mechanisms (legislation, compute governance, international coordination), field building (talent, funding, community), and resilience measures (epistemic security, economic adaptation). Each category has different tractability profiles, timelines, and risk coverage—understanding these tradeoffs is essential for strategic resource allocation.
An effective safety portfolio requires both breadth (covering diverse failure modes) and depth (sufficient investment in each area to achieve impact). The current portfolio shows significant concentration in certain areas (RLHF, capability evaluations) while other areas remain relatively neglected (epistemic resilience, international coordination).
Field Growth Trajectory
| Metric | 2022 | 2025 | Growth Rate | Notes |
|---|---|---|---|---|
| Technical AI Safety FTEs | 300 | 600 | 21%/year | AI Safety Field Growth Analysis 2025 |
| Non-Technical AI Safety FTEs | 100 | 500 | 71%/year | Governance, policy, operations |
| Total AI Safety FTEs | 400 | 1,100 | 40%/year | Field-wide compound growth |
| AI Safety Organizations | ≈50 | ≈120 | 24%/year | Exponential growth since 2020 |
| Capabilities FTEs (comparison) | ≈3,000 | ≈15,000 | 30-40%/year | OpenAI alone: 300 → 3,000 |
Critical Comparison: While AI safety workforce has grown substantially, capabilities research is growing 30-40% per year. The ratio of capabilities to safety researchers has remained roughly constant at 10-15:1, meaning the absolute gap continues to widen.
Top Research Categories (by FTEs):
- Miscellaneous technical AI safety research
- LLM safety
- Interpretability
Intervention Categories and Risk Coverage
Intervention by Risk Matrix
This matrix shows how strongly each major intervention addresses each risk category. Ratings are based on current evidence and expert assessments.
| Intervention | Accident Risks | Misuse Risks | Structural Risks | Epistemic Risks | Primary Mechanism |
|---|---|---|---|---|---|
| Interpretability | High | Low | Low | -- | Detect deception and misalignment in model internals |
| AI Control | High | Medium | -- | -- | External constraints regardless of AI intentions |
| Evaluations | High | Medium | Low | -- | Pre-deployment testing for dangerous capabilities |
| RLHF/Constitutional AI | Medium | Medium | -- | -- | Train models to follow human preferences |
| Scalable Oversight | Medium | Low | -- | -- | Human supervision of superhuman systems |
| Compute Governance | Low | High | Medium | -- | Hardware chokepoints limit access |
| Export Controls | Low | High | Medium | -- | Restrict adversary access to training compute |
| Responsible Scaling | Medium | Medium | Low | -- | Capability thresholds trigger safety requirements |
| International Coordination | Low | Medium | High | -- | Reduce racing dynamics through agreements |
| AI Safety Institutes | Medium | Medium | Medium | -- | Government capacity for evaluation and oversight |
| Field Building | Medium | Low | Medium | Low | Grow talent pipeline and research capacity |
| Epistemic Security | -- | Low | Low | High | Protect collective truth-finding capacity |
| Content Authentication | -- | Medium | -- | High | Verify authentic content in synthetic era |
Legend: High = primary focus, addresses directly; Medium = secondary impact; Low = indirect or limited; -- = minimal relevance
Prioritization Framework
This framework evaluates interventions across the standard Importance-Tractability-Neglectedness (ITN) dimensions, with additional consideration for timeline fit and portfolio complementarity.
| Intervention | Tractability | Impact Potential | Neglectedness | Timeline Fit | Overall Priority |
|---|---|---|---|---|---|
| Interpretability | Medium | High | Low | Long | High |
| AI Control | High | Medium-High | Medium | Near | Very High |
| Evaluations | High | Medium | Low | Near | High |
| Compute Governance | High | High | Low | Near | Very High |
| International Coordination | Low | Very High | High | Long | High |
| Field Building | High | Medium | Medium | Ongoing | Medium-High |
| Epistemic Resilience | Medium | Medium | High | Near-Long | Medium-High |
| Scalable Oversight | Medium-Low | High | Medium | Long | Medium |
Prioritization Rationale
Very High Priority:
- AI Control scores highly because it provides near-term safety benefits (70-85% tractability for human-level systems) regardless of whether alignment succeeds. It represents a practical bridge during the transition period. Redwood Research received $1.2M for control research in 2024.
- Compute Governance is one of few levers creating physical constraints on AI development. Hardware chokepoints exist, some measures are already implemented (EU AI Act compute thresholds, US export controls), and impact potential is substantial. GovAI produces leading research on compute governance mechanisms.
High Priority:
- Interpretability is potentially essential if alignment proves difficult (only reliable way to detect sophisticated deception). MIT Technology Review named mechanistic interpretability a 2026 Breakthrough Technology. Anthropic's attribution graphs revealed hidden reasoning in Claude 3.5 Haiku. FAS recommends federal R&D funding through DARPA and NSF.
- Evaluations provide measurable near-term impact and are already standard practice at major labs. Coefficient Giving launched an RFP for capability evaluations ($200K-$5M grants). METR partners with Anthropic and OpenAI on frontier model evaluations. NIST invested $20M in AI Economic Security Centers.
- International Coordination has very high impact potential for addressing structural risks like racing dynamics, but low tractability given current geopolitical tensions. The International AI Safety Report 2025, led by Yoshua Bengio with 100+ authors from 30 countries, represents the largest global collaboration to date.
Medium-High Priority:
- Field Building and Epistemic Resilience are relatively neglected meta-level interventions that multiply the effectiveness of direct technical and governance work. 80,000 Hours notes good funding opportunities in AI safety exist for qualified researchers.
Portfolio Gaps and Complementarities
Coverage Gaps
Analysis of the current intervention portfolio reveals several areas where coverage is thin:
| Gap Area | Current Investment | Risk Exposure | Recommended Action |
|---|---|---|---|
| Epistemic Risks | Under 5% of portfolio ($3-5M/year) | Epistemic collapse, reality fragmentation | Increase to 8-10% of portfolio; invest in content authentication and epistemic infrastructure |
| Long-term Structural Risks | 4-6% of portfolio; international coordination is low tractability | Lock-in, concentration of power | Develop alternative coordination mechanisms; invest in governance research |
| Post-Incident Recovery | Under 1% of portfolio | All risk categories | Develop recovery protocols and resilience measures; allocate 3-5% of portfolio |
| Misuse by State Actors | Export controls are primary lever; $5-10M in policy research | Authoritarian tools, surveillance | Research additional governance mechanisms; increase to $15-25M |
| Independent Evaluation Capacity | 70%+ of evals done by labs themselves | Conflict of interest, verification gaps | Coefficient Giving's eval RFP addresses this with $200K-$5M grants |
Key Complementarities
Certain interventions work better together than in isolation:
Technical + Governance:
- AI Evaluations inform Responsible Scaling Policies (RSPs) thresholds
- Interpretability enables verification for
- AI Control provides safety margin while governance matures
Near-term + Long-term:
- Compute Governance buys time for Interpretability research
- AI Evaluations identify near-term risks while Scalable Oversight develops
- AI Safety Field Building and Community ensures capacity for future technical work
Prevention + Resilience:
- Technical safety research aims to prevent failures
- AI-Era Epistemic Security and economic resilience limit damage if prevention fails
- Both are needed for robust defense-in-depth
Portfolio Funding Allocation
The following table estimates 2024 funding levels by intervention area and compares them to recommended allocations based on neglectedness and impact potential. Total external AI safety funding was approximately $110-130 million in 2024, with Coefficient Giving providing ~60% of this amount.
| Intervention Area | Est. 2024 Funding | % of Total | Recommended Shift | Key Funders |
|---|---|---|---|---|
| RLHF/Training Methods | $15-35M | ≈25% | Decrease to 20% | Frontier labs (internal), academic grants |
| Interpretability | $15-25M | ≈18% | Maintain | Coefficient Giving, Superalignment Fast Grants ($10M) |
| Evaluations & Evals Infrastructure | $12-18M | ≈13% | Increase to 20% | CAIS ($1.5M), UK AISI, labs |
| AI Control Research | $1-12M | ≈9% | Increase to 15% | Redwood Research ($1.2M), Anthropic |
| Compute Governance | $1-10M | ≈7% | Increase to 12% | Government programs, policy organizations |
| Field Building & Talent | $10-15M | ≈11% | Maintain | 80,000 Hours, MATS, various fellowships |
| Governance & Policy | $1-12M | ≈9% | Increase to 12% | Coefficient Giving policy grants, government initiatives |
| International Coordination | $1-5M | ≈4% | Increase to 8% | UK/EU government initiatives (≈$14M total) |
| Epistemic Resilience | $1-4M | ≈3% | Increase to 8% | Very few dedicated funders |
2025 Funding Landscape Update
| Funder | 2024 Allocation | Focus Areas | Source |
|---|---|---|---|
| Coefficient Giving | $63.6M | Technical safety, governance, field building | 60% of external funding |
| Jaan Tallinn | $20M | Long-term alignment research | Personal foundation |
| Eric Schmidt (Schmidt Sciences) | $10M | Safety benchmarking, adversarial evaluation | Quick Market Pitch |
| AI Safety Fund | $10M | Collaborative research (Anthropic, Google, Microsoft, OpenAI) | Frontier Model Forum |
| Future of Life Institute | $5M | Smaller grants, fellowships | Diverse portfolio |
| Steven Schuurman Foundation | €5M/year | Various AI safety initiatives | Elastic co-founder |
| Total External | $110-130M | — | 2024 estimate |
2025 Trajectory: Early data (through July 2025) shows $67M already committed, putting the year on track to exceed 2024 totals by 40-50%.
Funding Gap Analysis
The funding landscape reveals several structural imbalances:
| Gap Type | Current State | Impact | Recommended Action |
|---|---|---|---|
| Climate vs AI safety | Climate philanthropy: ≈$1-15B; AI safety: ≈$130M | ≈100x disparity despite comparable catastrophic potential | Increase AI safety funding to at least $100M-1B annually |
| Capabilities vs safety | ≈$100B in AI data center capex (2024) vs ≈$130M safety | ≈1500:1 ratio | Redirect 0.5-1% of capabilities spending to safety |
| Funder concentration | Coefficient Giving: 60% of external funding | Single point of failure; limits diversity | Diversify funding sources; new initiatives like Humanity AI ($100M) |
| Talent pipeline | Over-optimized for researchers | Shortage in governance, operations, advocacy | Expand non-research talent programs |
Resource Allocation Assessment
Current vs. Recommended Allocation
| Area | Current Allocation | Recommended | Rationale |
|---|---|---|---|
| RLHF/Training | Very High | High | Deployed at scale but limited effectiveness against deceptive alignment |
| Interpretability | High | High | Rapid progress; potential for fundamental breakthroughs |
| Evaluations | High | Very High | Critical for identifying dangerous capabilities pre-deployment |
| AI Control | Medium | High | Near-term tractable; provides safety regardless of alignment |
| Compute Governance | Medium | High | One of few physical levers; already showing policy impact |
| International Coordination | Low | Medium | Low tractability but very high stakes |
| Epistemic Resilience | Very Low | Medium | Highly neglected; addresses underserved risk category |
| Field Building | Medium | Medium | Maintain current investment; returns are well-established |
Investment Concentration Risks
The current portfolio shows several structural vulnerabilities:
| Concentration Type | Current State | Risk | Mitigation |
|---|---|---|---|
| Funder concentration | Coefficient Giving provides ≈60% of external funding | Strategy changes affect entire field | Cultivate diverse funding sources |
| Geographic concentration | US and UK receive majority of funding | Limited global coordination capacity | Support emerging hubs (Berlin, Canada, Australia) |
| Frontier lab dependence | Most technical safety at Anthropic, OpenAI, DeepMind | Conflicts of interest; limited independent verification | Increase funding to MIRI ($1.1M), Redwood, ARC |
| Research over operations | Pipeline over-optimized for researchers | Shortage of governance, advocacy, operations talent | Expand non-research career paths |
| Technical over governance | Technical ~60% vs governance ≈15% of funding | Governance may be more neglected and tractable | Rebalance toward policy research |
| Prevention over resilience | Minimal investment in post-incident recovery | No fallback if prevention fails | Develop recovery protocols |
Strategic Considerations
Worldview Dependencies
Different beliefs about AI risk lead to different portfolio recommendations:
| Worldview | Prioritize | Deprioritize |
|---|---|---|
| Alignment is very hard | Interpretability, Control, International coordination | RLHF, Voluntary commitments |
| Misuse is the main risk | Compute governance, Content authentication, Legislation | Interpretability, Agent foundations |
| Short timelines | AI Control, Evaluations, Responsible scaling | Long-term governance research |
| Racing dynamics dominate | International coordination, Compute governance | Unilateral safety research |
| Epistemic collapse is likely | Epistemic security, Content authentication | Technical alignment |
Portfolio Robustness
A robust portfolio should satisfy the following criteria, which can help evaluate current gaps and guide future allocation:
| Robustness Criterion | Current Status | Gap Assessment | Target |
|---|---|---|---|
| Cover multiple failure modes | Accident risks: 60% coverage; Misuse: 50%; Structural: 30%; Epistemic: under 15% | Medium gap | 70%+ coverage across all categories |
| Prevention and resilience | ~95% prevention, ≈5% resilience | Large gap | 80% prevention, 20% resilience |
| Near-term and long-term balance | 55% near-term (evals, control), 45% long-term (interpretability, governance) | Small gap | Maintain current balance |
| Independent research capacity | Frontier labs: 70%+ of technical safety; Independents: under 30% | Medium gap | 50/50 split between labs and independents |
| Support multiple worldviews | Most interventions robust across scenarios | Small gap | Maintain |
| Geographic diversity | US/UK: 80%+ of funding; EU: 10%; ROW: under 10% | Medium gap | US/UK: 60%, EU: 20%, ROW: 20% |
| Funder diversity | 5 funders provide 85% of external funding; Coefficient Giving alone provides 60% | Large gap | No single funder greater than 25% |
Key Sources
| Source | Type | Relevance |
|---|---|---|
| Coefficient Giving Progress 2024 | Funder Report | Primary data on AI safety funding levels and priorities |
| AI Safety Funding Situation Overview | Analysis | Comprehensive breakdown of funding sources and gaps |
| AI Safety Needs More Funders | Policy Brief | Comparison to other catastrophic risk funding |
| AI Safety Field Growth Analysis 2025 | Research | Field growth metrics, 1,100 FTEs, 21% annual growth |
| International AI Safety Report 2025 | Global Report | 100+ authors, 30 countries, Yoshua Bengio lead |
| Future of Life AI Safety Index 2025 | Industry Assessment | 33 indicators across 6 domains for 7 leading companies |
| Coefficient Giving Technical AI Safety RFP | Grant Program | $40M allocation for technical safety research |
| Coefficient Giving Capability Evaluations RFP | Grant Program | $200K-$5M grants for evaluation infrastructure |
| America's AI Action Plan (July 2025) | Policy | US government AI priorities including evaluations ecosystem |
| Accelerating AI Interpretability (FAS) | Policy Brief | Federal funding recommendations for interpretability |
| 80,000 Hours: AI Risk | Career Guidance | Intervention prioritization and neglectedness analysis |
| RLHF Limitations Paper | Research | Evidence on limitations of current alignment methods |
| Carnegie AI Safety as Global Public Good | Policy Analysis | International coordination challenges and research priorities |
| ITU Annual AI Governance Report 2025 | Global Report | AI governance landscape across nations |
References
Comprehensive study tracking the expansion of technical and non-technical AI safety fields from 2010 to 2025. Documents growth from approximately 400 to 1,100 full-time equivalent researchers across both domains.
The International AI Safety Report 2025 provides a global scientific assessment of general-purpose AI capabilities, risks, and potential management techniques. It represents a collaborative effort by 96 experts from 30 countries to establish a shared understanding of AI safety challenges.
Open Philanthropy provides grants across multiple domains including global health, catastrophic risks, and scientific progress. Their focus spans technological, humanitarian, and systemic challenges.
Open Philanthropy reviewed its philanthropic efforts in 2024, focusing on expanding partnerships, supporting AI safety research, and making strategic grants across multiple domains including global health and catastrophic risk reduction.
Mechanistic interpretability: 10 Breakthrough Technologies 2026 | MIT Technology Review You need to enable JavaScript to view this site.
13An Overview of the AI Safety Funding Situation (LessWrong)LessWrong·Stephen McAleese·2023·Blog post▸
Analyzes AI safety funding from sources like Open Philanthropy, Survival and Flourishing Fund, and academic institutions. Estimates total global AI safety spending and explores talent versus funding constraints.
The FLI AI Safety Index Summer 2025 assesses leading AI companies' safety efforts, finding widespread inadequacies in risk management and existential safety planning. Anthropic leads with a C+ grade, while most companies score poorly across critical safety domains.