MATS Program Mentor Directory

web

MATS (ML Alignment Theory Scholars) is a competitive research training program that pairs emerging AI safety researchers with experienced mentors; this directory is useful for identifying active researchers and their institutional affiliations within the AI safety field.

Metadata

Importance: 45/100tool pagereference

Summary

This page lists the mentors affiliated with the ML Alignment Theory Scholars (MATS) program, a research training initiative connecting emerging AI safety researchers with experienced mentors. The mentors work across AI alignment, interpretability, transparency, and AI security. The page also invites applications from prospective mentors.

Key Points

•Lists current MATS mentors working on AI alignment, transparency, and security research areas
•Includes prominent researchers such as Nicholas Carlini, Stephen Casper, Jacob Hilton, and others from leading AI safety organizations
•Each mentor has an individual profile page with more detail about their research focus and mentorship offerings
•The program actively recruits new mentors via an expression-of-interest application form
•Represents a cross-section of the AI safety research community spanning technical safety, interpretability, and governance

Cited by 1 page

Page	Type	Quality
MATS ML Alignment Theory Scholars program	Organization	60.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202613 KB

MATS Mentors 
 

 Research Mentors About 
 
 About 
 
 Mid Missouri Region 
 
 Kansas City Region 
 
 Apply Research Mentors About 
 
 About 
 
 
 
 
 
 
 Mentors MATS mentors are advancing the frontiers of AI alignment, transparency, and security

 Apply to mentor Filters 
 
 Strategy and Forecasting

 Scheming and Deception

 Policy and Governance

 Strategy and Forecasting

 Compute and Hardware

 Compute Infrastructure

 Alignment Training

 Dangerous Capability Evals

 Biorisk

 Interpretability

 Red-Teaming

 Safeguards

 Multi-Agent Safety

 Security

 Scalable Oversight

 Model Organisms

 Monitoring

 Adversarial Robustness

 AI Welfare

 Agent Foundations

 Control

 Buck Shlegeris Redwood Research , CEO — 
 
 
 
 
 
 
 
 
 Buck is the CEO of Redwood Research.

 Focus: Empirical Control, Model Organisms, Scheming and Deception, Strategy and Forecasting Programs: Winter 2024 Summer 2024 Winter 2025 Winter 2026 Summer 2026 Ethan Perez Anthropic , Member of Technical Staff — 
 
 
 
 
 
 
 
 
 Ethan Perez is a researcher at Anthropic, where he leads a team working on AI control, adversarial robustness, and other areas of AI safety research. His interests span many areas of LLM safety; he&apos;s previously led work on sleeper agents, red-teaming language models with language models, developing AI safety via debate using LLMs, and demonstrating and improving unfaithfulness in chain of thought reasoning. Read more on his website.

 Focus: Empirical Control, Model Organisms, Red-Teaming, Scheming and Deception Programs: Summer 2023 Summer 2024 Winter 2025 Winter 2026 Summer 2026 Samuel Marks Anthropic , Member of Technical Staff — 
 
 
 
 
 
 
 
 
 Sam leads the Cognitive Oversight subteam of Anthropic&apos;s Alignment Science team. Their goal is to be able to oversee AI systems not based on whether they have good input/output behavior, but based on whether there&apos;s anything suspicious about the cognitive processes underlying those behaviors. For example, one in-scope problem is "detecting when language models are lying, including in cases where it&apos;s difficult to tell based solely on input/output". His team is interested in both white-box techniques (e.g. interpretability-based techniques) and black-box techniques (e.g. finding good ways to interrogate models about their thought processes and motivations). For more flavor on this research direction, see his post here https://www.lesswrong.com/posts/s7uD3tzHMvD868ehr/discriminating-behaviorally-identical-classifiers-a-model 

 Focus: Empirical Control, Model Organisms, Red-Teaming, Scheming and Deception Programs: Winter 2026 Summer 2026 Neel Nanda Google DeepMind , Senior Research Scientist — 
 
 
 
 
 
 
 
 
 Neel leads the mechanistic interpretability team at Google DeepMind, trying to use the internals of models to understand them better, and use this to make them safer - eg detecting deception, understanding concerning behaviours, and monitoring deployed systems for harmful behavi

... (truncated, 13 KB total)

Resource ID: 38efa30161220491 | Stable ID: sid_DOWGyYhHXM