Back
MATS Program Mentor Directory
webmatsprogram.org·matsprogram.org/mentors
MATS (ML Alignment Theory Scholars) is a competitive research training program that pairs emerging AI safety researchers with experienced mentors; this directory is useful for identifying active researchers and their institutional affiliations within the AI safety field.
Metadata
Importance: 45/100tool pagereference
Summary
This page lists the mentors affiliated with the ML Alignment Theory Scholars (MATS) program, a research training initiative connecting emerging AI safety researchers with experienced mentors. The mentors work across AI alignment, interpretability, transparency, and AI security. The page also invites applications from prospective mentors.
Key Points
- •Lists current MATS mentors working on AI alignment, transparency, and security research areas
- •Includes prominent researchers such as Nicholas Carlini, Stephen Casper, Jacob Hilton, and others from leading AI safety organizations
- •Each mentor has an individual profile page with more detail about their research focus and mentorship offerings
- •The program actively recruits new mentors via an expression-of-interest application form
- •Represents a cross-section of the AI safety research community spanning technical safety, interpretability, and governance
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| MATS ML Alignment Theory Scholars program | Organization | 60.0 |
Cached Content Preview
HTTP 200Fetched Apr 7, 202613 KB
MATS Mentors
Research Mentors About
About
Mid Missouri Region
Kansas City Region
Apply Research Mentors About
About
Mentors MATS mentors are advancing the frontiers of AI alignment, transparency, and security
Apply to mentor Filters
Strategy and Forecasting
Scheming and Deception
Policy and Governance
Strategy and Forecasting
Compute and Hardware
Compute Infrastructure
Alignment Training
Dangerous Capability Evals
Biorisk
Interpretability
Red-Teaming
Safeguards
Multi-Agent Safety
Security
Scalable Oversight
Model Organisms
Monitoring
Adversarial Robustness
AI Welfare
Agent Foundations
Control
Buck Shlegeris Redwood Research , CEO —
Buck is the CEO of Redwood Research.
Focus: Empirical Control, Model Organisms, Scheming and Deception, Strategy and Forecasting Programs: Winter 2024 Summer 2024 Winter 2025 Winter 2026 Summer 2026 Ethan Perez Anthropic , Member of Technical Staff —
Ethan Perez is a researcher at Anthropic, where he leads a team working on AI control, adversarial robustness, and other areas of AI safety research. His interests span many areas of LLM safety; he's previously led work on sleeper agents, red-teaming language models with language models, developing AI safety via debate using LLMs, and demonstrating and improving unfaithfulness in chain of thought reasoning. Read more on his website.
Focus: Empirical Control, Model Organisms, Red-Teaming, Scheming and Deception Programs: Summer 2023 Summer 2024 Winter 2025 Winter 2026 Summer 2026 Samuel Marks Anthropic , Member of Technical Staff —
Sam leads the Cognitive Oversight subteam of Anthropic's Alignment Science team. Their goal is to be able to oversee AI systems not based on whether they have good input/output behavior, but based on whether there's anything suspicious about the cognitive processes underlying those behaviors. For example, one in-scope problem is "detecting when language models are lying, including in cases where it's difficult to tell based solely on input/output". His team is interested in both white-box techniques (e.g. interpretability-based techniques) and black-box techniques (e.g. finding good ways to interrogate models about their thought processes and motivations). For more flavor on this research direction, see his post here https://www.lesswrong.com/posts/s7uD3tzHMvD868ehr/discriminating-behaviorally-identical-classifiers-a-model
Focus: Empirical Control, Model Organisms, Red-Teaming, Scheming and Deception Programs: Winter 2026 Summer 2026 Neel Nanda Google DeepMind , Senior Research Scientist —
Neel leads the mechanistic interpretability team at Google DeepMind, trying to use the internals of models to understand them better, and use this to make them safer - eg detecting deception, understanding concerning behaviours, and monitoring deployed systems for harmful behavi
... (truncated, 13 KB total)Resource ID:
38efa30161220491 | Stable ID: sid_DOWGyYhHXM