Apollo Research — Research Overview
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Apollo Research
Apollo Research is a dedicated AI safety organization; this page indexes all their published work and is a useful starting point for tracking their contributions to scheming evaluations, interpretability, and AI governance.
Metadata
Summary
Apollo Research's research page aggregates their publications across evaluations, interpretability, and governance, with a focus on detecting and understanding AI scheming, deceptive alignment, and loss of control risks. Key featured works include a taxonomy for Loss of Control preparedness and stress-testing anti-scheming training methods in partnership with OpenAI. The page serves as a central index for their contributions to AI safety science and policy.
Key Points
- •Features a Loss of Control taxonomy and preparedness framework for policymakers, covering degrees and dynamics of LoC scenarios.
- •Includes evaluations of frontier models for in-context scheming behavior, including work done in collaboration with OpenAI on anti-scheming training.
- •Interpretability research covers sparse dictionary learning, linear probes for detecting strategic deception, and mechanistic description methods.
- •Governance work addresses EU AI Act compliance, national security AI assurance, and frameworks for AI incident reporting regimes.
- •Research spans both technical safety (evaluations, interpretability) and policy-facing outputs, making Apollo a cross-domain AI safety lab.
Cited by 9 pages
| Page | Type | Quality |
|---|---|---|
| Apollo Research | Organization | 58.0 |
| Alignment Evaluations | Approach | 65.0 |
| AI Evaluation | Approach | 72.0 |
| AI Safety Cases | Approach | 91.0 |
| Scheming & Deception Detection | Approach | 91.0 |
| Sleeper Agent Detection | Approach | 66.0 |
| Technical AI Safety Research | Crux | 66.0 |
| Mesa-Optimization | Risk | 63.0 |
| Scheming | Risk | 74.0 |
Cached Content Preview
Research – Apollo Research
Research
Featured
Governance
The Loss of Control Playbook: Degrees, Dynamics, and Preparedness
Despite increasing policy and research attention to Loss of Control (LoC), decision- and policymakers are still operating in the absence of a uniform conceptualization and definition of LoC. Today, we bridge this gap through a novel taxonomy and preparedness framework for LoC that explores the degrees and dynamics of LoC through a comprehensive best-in-class literature review and presents actionable tools to counter relevant threats to national security and humanity.
24/11/2025
Read more
Evaluations
Stress Testing Deliberative Alignment for Anti-Scheming Training
We partnered with OpenAI to assess frontier language models for early signs of scheming — covertly pursuing misaligned goals — in controlled stress-tests (non-typical environments), and studied a training method that can significantly reduce (but not eliminate) these behaviors. Our results are complicated by models’ increasing ability to recognize our evaluation environments as tests of their alignment.
17/09/2025
Read more
Our Research
Filter
Select all
Interpretability
Evaluations
Governance
Governance
Governance
Internal Deployment of AI Models and Systems in the EU AI Act
08/12/2025
Read more
Governance
Governance
Assurance of Frontier AI Built for National Security
09/10/2025
Read more
Governance
Governance
AI Behind Closed Doors: a Primer on The Governance of Internal Deployment
17/04/2025
Read more
Governance
Governance
Capturing and Countering Threats to National Security: a Blueprint for an Agile AI Incident Regime
15/04/2025
Read more
Interpretability
Interpretability
Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
11/02/2025
Read more
Interpretability
Interpretability
Detecting Strategic Deception Using Linear Probes
06/02/2025
Read more
Governance
Governance
Precursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities
06/02/2025
Read more
Evaluations
Evaluations
Frontier Models are Capable of In-Context Scheming
05/12/2024
Read more
Evaluations
Evaluations
Towards Safety Cases For AI Scheming
31/10/2024
... (truncated, 4 KB total)560dff85b3305858 | Stable ID: sid_Te6SBkUA2m