Apollo Research is an AI safety research organization founded in 2023 with a specific focus on one of the most concerning potential failure modes: deceptive alignment and scheming behavior in advanced AI systems.
Facts
1Other Data
| Dimension | Rating | Evidence | Assessor | |
|---|---|---|---|---|
| government-integration | Strong | UK AISI partner, US AISI consortium member, presented at Bletchley AI Summit | editorial | |
| intervention-impact | Measurable | Deliberative alignment reduced scheming from 13% to 0.4% (30x reduction) in OpenAI models | editorial | |
| key-finding-2024 | Critical | o1 maintains deception in over 85% of follow-up questions after engaging in scheming | editorial | |
| lab-partnerships | Extensive | Pre-deployment evaluations for [OpenAI](https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/), [Anthropic](https://www.anthropic.com), and [Google DeepMind](https://deepmind.google/blog/deepening-our-partnership-with-the-uk-ai-security-institute/) | editorial | |
| methodology-rigor | Very High | 300 rollouts per model/evaluation; statistically significant results (p less than 0.05) | editorial | |
| research-output | High Impact | [December 2024 paper](https://arxiv.org/abs/2412.04984) tested 6 frontier models across 180+ scenarios; cited in OpenAI/Anthropic safety frameworks | editorial | |
| team-size | ~20 researchers | Full-time staff including CEO Marius Hobbhahn, named [TIME 100 AI 2025](https://time.com/collections/time100-ai-2025/7305864/marius-hobbhahn/) | editorial |
Divisions
1AI safety evaluations focused on detecting deceptive and scheming behaviors in frontier models. Published influential research on in-context scheming in 2024.
Related Wiki Pages
Top Related Pages
METR
Model Evaluation and Threat Research conducts dangerous capability evaluations for frontier AI models, testing for autonomous replication, cybersec...
UK AI Safety Institute
The UK AI Safety Institute (renamed AI Security Institute in February 2025) is a government body with approximately 30+ technical staff and an annu...
Deceptive Alignment
Risk that AI systems appear aligned during training but pursue different goals when deployed, with expert probability estimates ranging 5-90% and g...
Scheming & Deception Detection
Research and evaluation methods for identifying when AI models engage in strategic deception—pretending to be aligned while secretly pursuing other...
US AI Safety Institute (now CAISI)
US government agency for AI safety research and standard-setting under NIST, established November 2023 with \$10M initial budget (FY2025 request of...