Skip to content
Longterm Wiki
Back

Apollo Research — Research Overview

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Apollo Research

Apollo Research is a dedicated AI safety organization; this page indexes all their published work and is a useful starting point for tracking their contributions to scheming evaluations, interpretability, and AI governance.

Metadata

Importance: 72/100homepage

Summary

Apollo Research's research page aggregates their publications across evaluations, interpretability, and governance, with a focus on detecting and understanding AI scheming, deceptive alignment, and loss of control risks. Key featured works include a taxonomy for Loss of Control preparedness and stress-testing anti-scheming training methods in partnership with OpenAI. The page serves as a central index for their contributions to AI safety science and policy.

Key Points

  • Features a Loss of Control taxonomy and preparedness framework for policymakers, covering degrees and dynamics of LoC scenarios.
  • Includes evaluations of frontier models for in-context scheming behavior, including work done in collaboration with OpenAI on anti-scheming training.
  • Interpretability research covers sparse dictionary learning, linear probes for detecting strategic deception, and mechanistic description methods.
  • Governance work addresses EU AI Act compliance, national security AI assurance, and frameworks for AI incident reporting regimes.
  • Research spans both technical safety (evaluations, interpretability) and policy-facing outputs, making Apollo a cross-domain AI safety lab.

Cited by 9 pages

PageTypeQuality
Apollo ResearchOrganization58.0
Alignment EvaluationsApproach65.0
AI EvaluationApproach72.0
AI Safety CasesApproach91.0
Scheming & Deception DetectionApproach91.0
Sleeper Agent DetectionApproach66.0
Technical AI Safety ResearchCrux66.0
Mesa-OptimizationRisk63.0
SchemingRisk74.0

Cached Content Preview

HTTP 200Fetched Apr 7, 20264 KB
Research – Apollo Research 
 
 
 
 
 
 
 
 
 

 
 
 
 

 

 
 
 
 
 
 
 
 
 

 
 
 
 

 
 
 
 
 
 
 
 
 
 Research

 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Featured 
 
 
 

 
 

 
 
 
 
 
 Governance

 
 
 
 The Loss of Control Playbook: Degrees, Dynamics, and Preparedness

 Despite increasing policy and research attention to Loss of Control (LoC), decision- and policymakers are still operating in the absence of a uniform conceptualization and definition of LoC. Today, we bridge this gap through a novel taxonomy and preparedness framework for LoC that explores the degrees and dynamics of LoC through a comprehensive best-in-class literature review and presents actionable tools to counter relevant threats to national security and humanity.

 
 

 
 24/11/2025 
 Read more 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 Evaluations

 
 
 
 Stress Testing Deliberative Alignment for Anti-Scheming Training

 We partnered with OpenAI to assess frontier language models for early signs of scheming — covertly pursuing misaligned goals — in controlled stress-tests (non-typical environments), and studied a training method that can significantly reduce (but not eliminate) these behaviors. Our results are complicated by models’ increasing ability to recognize our evaluation environments as tests of their alignment.

 
 

 
 17/09/2025 
 Read more 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Our Research 

 
 Filter 
 
 Select all

 Interpretability

 Evaluations

 Governance

 
 

 
 
 

 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 Internal Deployment of AI Models and Systems in the EU AI Act

 
 
 
 
 08/12/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 Assurance of Frontier AI Built for National Security

 
 
 
 
 09/10/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 AI Behind Closed Doors: a Primer on The Governance of Internal Deployment

 
 
 
 
 17/04/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 Capturing and Countering Threats to National Security: a Blueprint for an Agile AI Incident Regime

 
 
 
 
 15/04/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Interpretability 

 
 
 
 
 Interpretability

 
 
 
 Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition

 
 
 
 
 11/02/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Interpretability 

 
 
 
 
 Interpretability

 
 
 
 Detecting Strategic Deception Using Linear Probes

 
 
 
 
 06/02/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Governance 

 
 
 
 
 Governance

 
 
 
 Precursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities

 
 
 
 
 06/02/2025 
 Read more 
 
 
 
 
 
 
 
 

 
 Evaluations 

 
 
 
 
 Evaluations

 
 
 
 Frontier Models are Capable of In-Context Scheming

 
 
 
 
 05/12/2024 
 Read more 
 
 
 
 
 
 
 
 

 
 Evaluations 

 
 
 
 
 Evaluations

 
 
 
 Towards Safety Cases For AI Scheming

 
 
 
 
 31/10/2024 


... (truncated, 4 KB total)
Resource ID: 560dff85b3305858 | Stable ID: sid_Te6SBkUA2m