EH
Evan Hubinger
Co-authored Risks from Learned Optimization (2019) introducing mesa-optimization and deceptive alignment; led Sleeper Agents and Alignment Faking research at Anthropic; 3,400+ citations
Current Role
Head of Alignment Stress-Testing
Organization
Anthropic
Expert Positions1 topics
| Topic | View | Estimate | Confidence | Date |
|---|---|---|---|---|
| Likelihood of Deceptive Alignment | Possible | 40% | medium | 2019 |
Sources: Risks from Learned Optimization
Education
Harvey Mudd College
Publications & Resources2
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
→2024PaperEmpirical Safety
Risks from Learned Optimization in Advanced Machine Learning Systems
→2019PaperTechnical Safety
No career history recorded.
No funding connections recorded.
Facts4
People
Role / TitleHead of Alignment Stress-Testing
Employed ByAnthropic
Biographical
Notable ForCo-authored Risks from Learned Optimization (2019) introducing mesa-optimization and deceptive alignment; led Sleeper Agents and Alignment Faking research at Anthropic; 3,400+ citations
EducationHarvey Mudd College