AI Safety Training Programs

Approach

AI Safety Training Programs

Comprehensive guide to AI safety training programs including MATS (78% alumni in alignment work, 100+ scholars annually), Anthropic Fellows ($2,100/week stipend, 40%+ hired full-time), LASR Labs (5 NeurIPS papers in 2024), and academic pathways. BlueDot Impact has trained 7,000+ people since 2022, with hundreds now working in AI safety. Provides concrete application criteria, timing recommendations, and structured self-study pathways with 1-5 year timeline to research contribution.

EA Forum

Organizations

Approaches

2.2k words · 4 backlinks

Quick Assessment

Dimension	Assessment	Evidence
Tractability	High	Known how to train researchers; programs have proven track records
Scalability	Medium	Bottlenecked by mentor availability and quality maintenance
Current Maturity	Medium-High	Ecosystem established since 2021; 298+ MATS scholars trained
Time Horizon	1-5 years	Trained researchers take 1-3 years to contribute meaningfully
Key Proponents	Many	MATS, BlueDot Impact, Anthropic, Coefficient Giving
Estimated Impact	Medium-High	Produces 100-200 new safety researchers annually

Overview

The AI safety field faces a critical talent bottleneck. While funding has increased substantially—with Coefficient Giving committing roughly $50 million to technical AI safety research in 2024—the supply of researchers capable of doing high-quality technical safety work remains constrained. Training programs represent the primary pipeline for addressing this gap, offering structured pathways from general ML expertise to safety-specific research skills.

The landscape has evolved rapidly since 2020. MATS (ML Alignment Theory Scholars) has become the premier research mentorship program, with 78% of surveyed alumni now working in AI alignment. Anthropic launched a Fellows Program specifically for mid-career transitions. BlueDot Impact has trained over 7,000 people since 2022, with hundreds now working at organizations like Anthropic, OpenAI, and the UK AI Safety Institute. Academic programs are emerging at York (SAINTS CDT), Berkeley (CHAI), and Cambridge (CHIA). Independent research programs like SPAR and LASR Labs provide part-time pathways. Together, these programs produce perhaps 100-200 new safety researchers annually—a number that may be insufficient given the pace of AI capabilities advancement.

The strategic importance of training extends beyond individual researcher production. Programs shape research culture, determine which problems receive attention, and create networks that influence the field's direction. How training programs select participants, what methodologies they emphasize, and which mentors they feature all have downstream effects on AI safety's trajectory.

Program Comparison

Program	Duration	Format	Stipend	Selectivity	Key Outcomes
MATS	12 weeks + 6mo extension	In-person (Berkeley, London)	Living stipend	≈5-10%	78% in alignment work; 75% publish
Anthropic Fellows	6 months	In-person (SF)	$2,100/week	Selective	40%+ hired full-time at Anthropic
LASR Labs	13 weeks	In-person (London)	£11,000	Moderate	All 5 Summer 2024 papers at NeurIPS
SPAR	3 months	Remote, part-time	Varies	Moderate	Papers at ICML, NeurIPS; career fair
ARENA	5 weeks	In-person (London)	Housing/travel	Moderate	Alumni at Apollo, METR, UK AISI
BlueDot Technical AI Safety	8 weeks	Online cohorts	None	Low-moderate	7,000+ trained; hundreds in field

Major Training Programs

MATS (ML Alignment Theory Scholars)

MATS is the most established and influential AI safety research program, operating as an intensive mentorship connecting promising researchers with leading safety researchers. Since its inception in late 2021, MATS has supported 298 scholars and 75 mentors.

Attribute	Details
Duration	12 weeks intensive + 6 months extension
Format	In-person (Berkeley, London)
Focus	Technical alignment research
Mentors	Researchers from Anthropic, DeepMind, Redwood, FAR.AI, ARC
Compensation	Living stipend provided
Selectivity	≈5-10% acceptance rate
Alumni outcomes	78% now working in AI alignment

Research Areas:

Interpretability and mechanistic understanding
AI control and containment
Scalable oversight
Evaluations and red-teaming
Robustness and security

Notable Alumni Contributions: MATS fellows have contributed to sparse autoencoders for interpretability, activation engineering research, developmental interpretability, and externalized reasoning oversight. Alumni have published at ICML and NeurIPS on safety-relevant topics. Nina Rimsky received an Outstanding Paper Award at ACL 2024 for "Steering Llama 2 via Contrastive Activation Addition." Alumni have founded organizations including Apollo Research, Timaeus, Leap Labs, and the Center for AI Policy.

Anthropic Fellows Program

Launched in 2024, the Anthropic Fellows Program targets mid-career technical professionals transitioning into AI safety research.

Attribute	Details
Duration	6 months full-time
Format	In-person (San Francisco)
Focus	Transition to safety research
Compensation	$2,100/week stipend + $15,000/month compute budget
Target	Mid-career technical professionals
First cohort	March 2025
First cohort outcomes	Over 80% published papers; 40%+ joined Anthropic full-time

The program addresses a specific gap: talented ML engineers and researchers who want to transition to safety work but lack the mentorship and runway to do so. By providing substantial compensation and direct collaboration with Anthropic researchers, it removes financial barriers to career change. First cohort fellows produced notable research including work on agentic misalignment, attribution graphs for mechanistic interpretability, and autonomous blockchain vulnerability exploitation.

SPAR (Supervised Program for Alignment Research)

SPAR offers a part-time, remote research fellowship enabling broader participation in safety research without requiring full-time commitment.

Attribute	Details
Duration	3 months
Format	Remote, part-time
Focus	AI safety and governance research
Target	Students and professionals
Output	Research projects culminating in Demo Day with career fair
Scale	130+ projects offered in Spring 2026—largest AI safety fellowship round

SPAR research has been accepted at ICML and NeurIPS, covered by TIME, and led to full-time job offers. Mentors come from Google DeepMind, RAND, Apollo Research, UK AISI, MIRI, and universities including Cambridge, Harvard, Oxford, and MIT. The program works well for:

Graduate students exploring safety research
Professionals testing interest before career change
Researchers in adjacent fields wanting to contribute

LASR Labs

LASR Labs provides cohort-based technical AI safety research, preparing participants for roles at safety organizations.

Attribute	Details
Duration	13 weeks
Format	In-person (London)
Focus	Technical safety research
Stipend	£11,000 + office space, food, travel
2024 Outcomes	All 5 Summer 2024 papers accepted to NeurIPS workshops
Career Outcomes	Alumni at UK AISI, Apollo Research, OpenAI dangerous capabilities team, Coefficient Giving
Satisfaction	9.25/10 likelihood to recommend; NPS +75

Research topics include interpretability (sparse autoencoders, residual streams), AI control, and steganographic collusion in LLMs. Supervisors include researchers from Google DeepMind, Anthropic, and UK AISI.

Global AI Safety Fellowship

Impact Academy's Global AI Safety Fellowship is a fully funded program (up to 6 months) connecting exceptional STEM talent with leading safety organizations.

Attribute	Details
Duration	Up to 6 months
Format	In-person collaboration
Partners	CHAI (Berkeley), Conjecture, FAR.AI, UK AISI
Funding	Fully funded

Academic Pathways

PhD Programs

Program	Institution	Focus	Status
SAINTS CDT	University of York (UK)	Safe Autonomy	Accepting applications
CHAI	UC Berkeley	Human-Compatible AI	Established
CHIA	Cambridge	Human-Inspired AI	Active
Steinhardt Lab	UC Berkeley	ML Safety	Active
Other ML programs	Various	General ML with safety focus	Many options

University of York - SAINTS CDT: The UK's first Centre for Doctoral Training specifically focused on AI safety, funded by UKRI. Brings together computer science, philosophy, law, sociology, and economics to train the next generation of safe AI experts. Based at the Institute for Safe Autonomy.

Key Academic Researchers: Prospective PhD students should consider advisors who work on safety-relevant topics:

Stuart Russell (Berkeley/CHAI) - Human-compatible AI
Jacob Steinhardt (Berkeley) - ML safety and robustness
Vincent Conitzer (CMU) - AI alignment theory
David Duvenaud (Toronto) - Interpretability
Roger Grosse (Toronto) - Training dynamics
Victor Veitch (Chicago) - Causal ML, safety

Academic vs. Industry Research

Dimension	Academic Path	Industry Path
Timeline	4-6 years	0-2 years to entry
Research freedom	High	Varies
Resources	Limited	Often substantial
Publication	Expected	Sometimes restricted
Salary during training	PhD stipend (≈$10-50K)	Full salary or fellowship
Ultimate outcome	Research career	Research career
Best for	Deep expertise, theory	Immediate impact, applied

Upskilling Resources

For those not yet ready for formal programs or preferring self-directed learning:

Structured Curricula

Resource	Provider	Coverage	Time Investment
AI Safety Syllabus	80,000 Hours	Comprehensive reading list	40-100+ hours
Technical AI Safety Course	BlueDot Impact	Structured curriculum	8 weeks
AI Safety Operations Bootcamp	BlueDot Impact	Operations roles in AI safety	Intensive
ML Safety Course	Dan Hendrycks	Technical foundations	Semester
ARENA	ARENA	Technical implementations (mech interp, transformers)	5 weeks

BlueDot Impact has become the primary entry point into the AI safety field, training over 7,000 people since 2022 and raising $35M including $25M in 2025. ARENA alumni have gone on to become MATS scholars, LASR participants, and AI safety engineers at Apollo Research, METR, and UK AISI.

Self-Study Path

Diagram (loading…)

flowchart TD
  subgraph FOUNDATIONS["Foundations (2-4 months)"]
      A[ML Fundamentals] --> B[Deep Learning]
      B --> C[Transformer Architecture]
  end

  subgraph SAFETY_BASICS["Safety Basics (2-3 months)"]
      C --> D[80K Hours Syllabus]
      D --> E[Key Papers]
      E --> F[Alignment Forum]
  end

  subgraph SPECIALIZATION["Specialization (3-6 months)"]
      F --> G[Choose Focus Area]
      G --> G1[Interpretability]
      G --> G2[Scalable Oversight]
      G --> G3[Evaluations]
      G --> G4[Governance]
  end

  subgraph APPLICATION["Application (ongoing)"]
      G1 --> H[Apply to Programs]
      G2 --> H
      G3 --> H
      G4 --> H
      H --> I[Independent Research]
      I --> J[Contribute to Field]
  end

  style FOUNDATIONS fill:#e1f5ff
  style SAFETY_BASICS fill:#fff3cd
  style SPECIALIZATION fill:#d4edda
  style APPLICATION fill:#f0f0f0

Career Transition Considerations

When to Apply to Programs

Your Situation	Recommended Path
Strong ML background, want safety focus	MATS or Anthropic Fellows
Exploring interest, employed	SPAR (part-time)
Student, want research experience	LASR Labs, SPAR
Early career, want PhD	Academic programs
Mid-career, want full transition	Anthropic Fellows
Strong background, want independence	Self-study + independent research

Success Factors

Based on program outcomes, successful applicants typically have:

Factor	Importance	How to Develop
ML technical skills	Critical	Courses, projects, publications
Research experience	High	Academic or industry research
Safety knowledge	Medium-High	Reading, courses, writing
Communication	Medium	Writing, presentations
Clear research interests	Medium	Reading, reflection, pilot projects

Common Failure Modes

Failure Mode	Description	Mitigation
Premature application	Applying without sufficient ML skills	Build fundamentals first
No research output	Nothing demonstrating research capability	Complete pilot project
Vague interests	Unable to articulate what you want to work on	Read extensively, form views
Poor fit	Mismatch between interests and program	Research programs carefully
Giving up early	Rejection discouragement	Multiple applications, iterate

Talent Pipeline Analysis

Current Capacity

Stage	Annual Output	Bottleneck
Interested individuals	Thousands	Conversion
Program applicants	500-1000	Selectivity
Program participants	150-300	Capacity
Research-productive alumni	100-200	Mentorship
Long-term field contributors	50-100	Retention

Scaling Challenges

Challenge	Description	Potential Solutions
Mentor bandwidth	Limited senior researchers available	Peer mentorship, async formats
Quality maintenance	Scaling may dilute intensity	Tiered programs
Funding	Programs need sustainable funding	Philanthropic, industry, government
Coordination	Many programs with unclear differentiation	Better information, specialization
Retention	Many trained researchers leave safety	Better career paths, culture

Strategic Assessment

Dimension	Assessment	Notes
Tractability	High	Known how to train researchers
If AI risk high	High	Need many more researchers
If AI risk low	Medium	Still valuable for responsible development
Neglectedness	Medium	$50M+ annually from Coefficient Giving but scaling gaps
Timeline to impact	1-5 years	Trained researchers take time to contribute
Grade	B+	Important but faces scaling limits

Risks Addressed

Risk	Mechanism	Effectiveness
Inadequate safety research	More researchers doing safety work	High
Racing dynamics	Safety talent at labs can advocate	Medium
Field capture	Diverse training reduces groupthink	Medium

Complementary Interventions

Field Building - Broader ecosystem development
Corporate Influence - Placing trained researchers at labs
AI Safety Institutes - Employers for trained researchers

Sources

Program Information

MATS: matsprogram.org - Official program information; Alumni Impact Analysis (2024)
Anthropic Fellows: alignment.anthropic.com - Program details; 2026 cohort applications
SPAR: sparai.org - Supervised Program for Alignment Research
LASR Labs: lasrlabs.org - London AI Safety Research Labs
BlueDot Impact: bluedot.org - AI safety courses and career support
ARENA: arena.education - Alignment Research Engineer Accelerator
Global AI Safety Fellowship: globalaisafetyfellowship.com

Funding and Ecosystem

Coefficient Giving: 2024 Progress and 2025 Plans - $50M committed to technical AI safety in 2024
Coefficient Giving: Technical AI Safety RFP - $40M+ available

Career Guidance

80,000 Hours: "AI Safety Syllabus" and career guide
Alignment Forum: Career advice threads
EA Forum: "Rank Best Universities for AI Safety"

Academic Programs

University of York SAINTS CDT: york.ac.uk/study/postgraduate-research/centres-doctoral-training/safe-ai-training
Stanford Center for AI Safety: aisafety.stanford.edu
CHAI (Berkeley): humancompatible.ai

References

1MATS Research Programmatsprogram.org▸

MATS is an intensive fellowship program designed to help researchers transition into AI safety careers, offering structured mentorship from leading researchers, stipends, and community integration. Since 2021, it has trained over 446 researchers who have collectively produced 150+ research papers and gone on to work at top AI safety organizations.

matsprogram.org

2BlueDot Impact – AI Safety Education & Field-Buildingbluedot.org▸

BlueDot Impact is an organization focused on building the AI safety field through structured educational programs and courses. It offers cohort-based training programs designed to help professionals transition into or contribute to AI safety and biosecurity work. Its programs aim to rapidly grow the pipeline of skilled talent working on existential risk reduction.

bluedot.org

3Anthropic Fellows ProgramAnthropic Alignment▸

Anthropic is launching a 6-month fellowship offering 10-15 technical professionals the opportunity to conduct full-time AI safety research with mentorship and funding. The program is designed to broaden the pipeline of qualified AI safety researchers by supporting people who may not yet be working full-time in the field.

★★★★☆

alignment.anthropic.com

4Anthropic Fellows ProgramAnthropic Alignment▸

The Anthropic Fellows Program is a research fellowship initiative offering selected researchers the opportunity to work on AI safety and alignment problems at Anthropic. It aims to bring in external talent to contribute to Anthropic's core safety research areas, including interpretability, alignment, and related technical challenges.

★★★★☆

alignment.anthropic.com

5SPAR - Research Program for AI Riskssparai.org▸

SPAR (Supervised Program for Alignment Research) is a structured mentorship program that pairs aspiring researchers with experienced AI safety professionals to conduct research on AI safety, alignment, and policy topics. The program provides hands-on research experience, guidance from domain experts, and opportunities for publication, serving as an entry point for newcomers to the AI safety field.

sparai.org

6ARENA – AI Safety Educationarena.education▸

ARENA (Alignment Research Engineer Accelerator) is an educational program designed to train technical AI safety researchers by providing structured curriculum covering mechanistic interpretability, reinforcement learning, and other core alignment topics. It aims to build the pipeline of competent AI safety engineers by offering hands-on, project-based learning. The program serves as a key field-building initiative to address the talent gap in technical AI safety research.

arena.education

AI Safety Training Programs