Skip to content

AI Safety Training Programs

📋Page Status
Page Type:ResponseStyle Guide →Intervention/response page
Quality:70 (Good)⚠️
Importance:75 (High)
Last edited:2026-01-28 (4 days ago)
Words:2.3k
Structure:
📊 18📈 1🔗 16📚 3013%Score: 14/15
LLM Summary:Comprehensive guide to AI safety training programs including MATS (78% alumni in alignment work, 100+ scholars annually), Anthropic Fellows ($2,100/week stipend, 40%+ hired full-time), LASR Labs (5 NeurIPS papers in 2024), and academic pathways. BlueDot Impact has trained 7,000+ people since 2022, with hundreds now working in AI safety. Provides concrete application criteria, timing recommendations, and structured self-study pathways with 1-5 year timeline to research contribution.
Critical Insights (4):
  • Counterint.MATS achieves an exceptional 80% alumni retention rate in AI alignment work, compared to typical academic-to-industry transitions, indicating that intensive mentorship programs may be far more effective than traditional academic pathways for safety research careers.S:4.5I:4.0A:4.5
  • Quant.AI safety training programs produce only 100-200 new researchers annually despite over $10 million in annual funding from Coefficient Giving alone, suggesting a severe talent conversion bottleneck rather than a funding constraint.S:4.0I:4.5A:4.0
  • GapThe field's talent pipeline faces a critical mentor bandwidth bottleneck, with only 150-300 program participants annually from 500-1000 applicants, suggesting that scaling requires solving mentor availability rather than just funding more programs.S:3.5I:4.5A:4.0
Issues (2):
  • QualityRated 70 but structure suggests 93 (underrated by 23 points)
  • Links18 links could use <R> components
See also:EA Forum
DimensionRatingNotes
TractabilityHighKnown how to train researchers; programs have proven track records
ScalabilityMediumBottlenecked by mentor availability and quality maintenance
Current MaturityMedium-HighEcosystem established since 2021; 298+ MATS scholars trained
Time Horizon1-5 yearsTrained researchers take 1-3 years to contribute meaningfully
Key ProponentsMATS, BlueDot Impact, Anthropic, Coefficient Giving
Estimated ImpactMedium-HighProduces 100-200 new safety researchers annually

The AI safety field faces a critical talent bottleneck. While funding has increased substantially—with Coefficient Giving committing roughly $50 million to technical AI safety research in 2024—the supply of researchers capable of doing high-quality technical safety work remains constrained. Training programs represent the primary pipeline for addressing this gap, offering structured pathways from general ML expertise to safety-specific research skills.

The landscape has evolved rapidly since 2020. MATS (ML Alignment Theory Scholars) has become the premier research mentorship program, with 78% of surveyed alumni now working in AI alignment. Anthropic launched a Fellows Program specifically for mid-career transitions. BlueDot Impact has trained over 7,000 people since 2022, with hundreds now working at organizations like Anthropic, OpenAI, and the UK AI Safety Institute. Academic programs are emerging at York (SAINTS CDT), Berkeley (CHAI), and Cambridge (CHIA). Independent research programs like SPAR and LASR Labs provide part-time pathways. Together, these programs produce perhaps 100-200 new safety researchers annually—a number that may be insufficient given the pace of AI capabilities advancement.

The strategic importance of training extends beyond individual researcher production. Programs shape research culture, determine which problems receive attention, and create networks that influence the field’s direction. How training programs select participants, what methodologies they emphasize, and which mentors they feature all have downstream effects on AI safety’s trajectory.

ProgramDurationFormatStipendSelectivityKey Outcomes
MATS12 weeks + 6mo extensionIn-person (Berkeley, London)Living stipend≈5-10%78% in alignment work; 75% publish
Anthropic Fellows6 monthsIn-person (SF)$2,100/weekSelective40%+ hired full-time at Anthropic
LASR Labs13 weeksIn-person (London)£11,000ModerateAll 5 Summer 2024 papers at NeurIPS
SPAR3 monthsRemote, part-timeVariesModeratePapers at ICML, NeurIPS; career fair
ARENA5 weeksIn-person (London)Housing/travelModerateAlumni at Apollo, METR, UK AISI
BlueDot Technical AI Safety8 weeksOnline cohortsNoneLow-moderate7,000+ trained; hundreds in field

MATS is the most established and influential AI safety research program, operating as an intensive mentorship connecting promising researchers with leading safety researchers. Since its inception in late 2021, MATS has supported 298 scholars and 75 mentors.

AttributeDetails
Duration12 weeks intensive + 6 months extension
FormatIn-person (Berkeley, London)
FocusTechnical alignment research
MentorsResearchers from Anthropic, DeepMind, Redwood, FAR.AI, ARC
CompensationLiving stipend provided
Selectivity≈5-10% acceptance rate
Alumni outcomes78% now working in AI alignment

Research Areas:

  • Interpretability and mechanistic understanding
  • AI control and containment
  • Scalable oversight
  • Evaluations and red-teaming
  • Robustness and security

Notable Alumni Contributions: MATS fellows have contributed to sparse autoencoders for interpretability, activation engineering research, developmental interpretability, and externalized reasoning oversight. Alumni have published at ICML and NeurIPS on safety-relevant topics. Nina Rimsky received an Outstanding Paper Award at ACL 2024 for “Steering Llama 2 via Contrastive Activation Addition.” Alumni have founded organizations including Apollo Research, Timaeus, Leap Labs, and the Center for AI Policy.

Launched in 2024, the Anthropic Fellows Program targets mid-career technical professionals transitioning into AI safety research.

AttributeDetails
Duration6 months full-time
FormatIn-person (San Francisco)
FocusTransition to safety research
Compensation$2,100/week stipend + $15,000/month compute budget
TargetMid-career technical professionals
First cohortMarch 2025
First cohort outcomesOver 80% published papers; 40%+ joined Anthropic full-time

The program addresses a specific gap: talented ML engineers and researchers who want to transition to safety work but lack the mentorship and runway to do so. By providing substantial compensation and direct collaboration with Anthropic researchers, it removes financial barriers to career change. First cohort fellows produced notable research including work on agentic misalignment, attribution graphs for mechanistic interpretability, and autonomous blockchain vulnerability exploitation.

SPAR (Supervised Program for Alignment Research)

Section titled “SPAR (Supervised Program for Alignment Research)”

SPAR offers a part-time, remote research fellowship enabling broader participation in safety research without requiring full-time commitment.

AttributeDetails
Duration3 months
FormatRemote, part-time
FocusAI safety and governance research
TargetStudents and professionals
OutputResearch projects culminating in Demo Day with career fair
Scale130+ projects offered in Spring 2026—largest AI safety fellowship round

SPAR research has been accepted at ICML and NeurIPS, covered by TIME, and led to full-time job offers. Mentors come from Google DeepMind, RAND, Apollo Research, UK AISI, MIRI, and universities including Cambridge, Harvard, Oxford, and MIT. The program works well for:

  • Graduate students exploring safety research
  • Professionals testing interest before career change
  • Researchers in adjacent fields wanting to contribute

LASR Labs provides cohort-based technical AI safety research, preparing participants for roles at safety organizations.

AttributeDetails
Duration13 weeks
FormatIn-person (London)
FocusTechnical safety research
Stipend£11,000 + office space, food, travel
2024 OutcomesAll 5 Summer 2024 papers accepted to NeurIPS workshops
Career OutcomesAlumni at UK AISI, Apollo Research, OpenAI dangerous capabilities team, Coefficient Giving
Satisfaction9.25/10 likelihood to recommend; NPS +75

Research topics include interpretability (sparse autoencoders, residual streams), AI control, and steganographic collusion in LLMs. Supervisors include researchers from Google DeepMind, Anthropic, and UK AISI.

Impact Academy’s Global AI Safety Fellowship is a fully funded program (up to 6 months) connecting exceptional STEM talent with leading safety organizations.

AttributeDetails
DurationUp to 6 months
FormatIn-person collaboration
PartnersCHAI (Berkeley), Conjecture, FAR.AI, UK AISI
FundingFully funded
ProgramInstitutionFocusStatus
SAINTS CDTUniversity of York (UK)Safe AutonomyAccepting applications
CHAIUC BerkeleyHuman-Compatible AIEstablished
CHIACambridgeHuman-Inspired AIActive
Steinhardt LabUC BerkeleyML SafetyActive
Other ML programsVariousGeneral ML with safety focusMany options

University of York - SAINTS CDT: The UK’s first Centre for Doctoral Training specifically focused on AI safety, funded by UKRI. Brings together computer science, philosophy, law, sociology, and economics to train the next generation of safe AI experts. Based at the Institute for Safe Autonomy.

Key Academic Researchers: Prospective PhD students should consider advisors who work on safety-relevant topics:

  • Stuart Russell (Berkeley/CHAI) - Human-compatible AI
  • Jacob Steinhardt (Berkeley) - ML safety and robustness
  • Vincent Conitzer (CMU) - AI alignment theory
  • David Duvenaud (Toronto) - Interpretability
  • Roger Grosse (Toronto) - Training dynamics
  • Victor Veitch (Chicago) - Causal ML, safety
DimensionAcademic PathIndustry Path
Timeline4-6 years0-2 years to entry
Research freedomHighVaries
ResourcesLimitedOften substantial
PublicationExpectedSometimes restricted
Salary during trainingPhD stipend (≈$10-50K)Full salary or fellowship
Ultimate outcomeResearch careerResearch career
Best forDeep expertise, theoryImmediate impact, applied

For those not yet ready for formal programs or preferring self-directed learning:

ResourceProviderCoverageTime Investment
AI Safety Syllabus80,000 HoursComprehensive reading list40-100+ hours
Technical AI Safety CourseBlueDot ImpactStructured curriculum8 weeks
AI Safety Operations BootcampBlueDot ImpactOperations roles in AI safetyIntensive
ML Safety CourseDan HendrycksTechnical foundationsSemester
ARENAARENATechnical implementations (mech interp, transformers)5 weeks

BlueDot Impact has become the primary entry point into the AI safety field, training over 7,000 people since 2022 and raising $35M including $25M in 2025. ARENA alumni have gone on to become MATS scholars, LASR participants, and AI safety engineers at Apollo Research, METR, and UK AISI.

Loading diagram...
Your SituationRecommended Path
Strong ML background, want safety focusMATS or Anthropic Fellows
Exploring interest, employedSPAR (part-time)
Student, want research experienceLASR Labs, SPAR
Early career, want PhDAcademic programs
Mid-career, want full transitionAnthropic Fellows
Strong background, want independenceSelf-study + independent research

Based on program outcomes, successful applicants typically have:

FactorImportanceHow to Develop
ML technical skillsCriticalCourses, projects, publications
Research experienceHighAcademic or industry research
Safety knowledgeMedium-HighReading, courses, writing
CommunicationMediumWriting, presentations
Clear research interestsMediumReading, reflection, pilot projects
Failure ModeDescriptionMitigation
Premature applicationApplying without sufficient ML skillsBuild fundamentals first
No research outputNothing demonstrating research capabilityComplete pilot project
Vague interestsUnable to articulate what you want to work onRead extensively, form views
Poor fitMismatch between interests and programResearch programs carefully
Giving up earlyRejection discouragementMultiple applications, iterate
StageAnnual OutputBottleneck
Interested individualsThousandsConversion
Program applicants500-1000Selectivity
Program participants150-300Capacity
Research-productive alumni100-200Mentorship
Long-term field contributors50-100Retention
ChallengeDescriptionPotential Solutions
Mentor bandwidthLimited senior researchers availablePeer mentorship, async formats
Quality maintenanceScaling may dilute intensityTiered programs
FundingPrograms need sustainable fundingPhilanthropic, industry, government
CoordinationMany programs with unclear differentiationBetter information, specialization
RetentionMany trained researchers leave safetyBetter career paths, culture
DimensionAssessmentNotes
TractabilityHighKnown how to train researchers
If AI risk highHighNeed many more researchers
If AI risk lowMediumStill valuable for responsible development
NeglectednessMedium$50M+ annually from Coefficient Giving but scaling gaps
Timeline to impact1-5 yearsTrained researchers take time to contribute
GradeB+Important but faces scaling limits
RiskMechanismEffectiveness
Inadequate safety researchMore researchers doing safety workHigh
Racing dynamicsSafety talent at labs can advocateMedium
Field captureDiverse training reduces groupthinkMedium
  • Field Building - Broader ecosystem development
  • Corporate Influence - Placing trained researchers at labs
  • AI Safety Institutes - Employers for trained researchers
  • Coefficient Giving: 2024 Progress and 2025 Plans - $50M committed to technical AI safety in 2024
  • Coefficient Giving: Technical AI Safety RFP - $40M+ available
  • 80,000 Hours: “AI Safety Syllabus” and career guide
  • Alignment Forum: Career advice threads
  • EA Forum: “Rank Best Universities for AI Safety”
  • University of York SAINTS CDT: york.ac.uk/study/postgraduate-research/centres-doctoral-training/safe-ai-training
  • Stanford Center for AI Safety: aisafety.stanford.edu
  • CHAI (Berkeley): humancompatible.ai

AI safety training programs improve the Ai Transition Model through multiple factors:

FactorParameterImpact
Misalignment PotentialSafety-Capability GapProduces 100-200 new safety researchers annually to address research talent bottleneck
Misalignment PotentialAlignment RobustnessMentored researchers produce higher-quality alignment work
Civilizational CompetenceInstitutional QualityTrained researchers staff AI Safety Institutes and governance organizations

Training programs are critical infrastructure for the field; their effectiveness is bottlenecked by limited mentor bandwidth and retention challenges.