Longterm Wiki

Evaluation Awareness

AI models increasingly detect when they are being evaluated and adjust their behavior accordingly. Claude Sonnet 4.5 detected evaluation contexts 58% of the time, and for Opus 4.6 Apollo Research reported evaluation awareness so strong they could not properly assess alignment. Awareness scales as a power law with model size.

Related

Related Pages

Top Related Pages

Risks

SchemingAI Capability SandbaggingTreacherous TurnMesa-Optimization

Analysis

Deceptive Alignment Decomposition Model

Approaches

AI EvaluationScheming & Deception Detection

Other

InterpretabilityEliezer YudkowskyPaul Christiano

Organizations

OpenAIUK AI Safety InstituteRedwood Research

Policy

Responsible Scaling Policies

Concepts

Alignment Evaluation OverviewSituational AwarenessLarge Language ModelsPersuasion and Social Manipulation

Key Debates

AI Risk Critical Uncertainties ModelAI Safety Solution Cruxes

Tags

evaluation-gamingdeceptionschemingscaling-lawsbehavioral-evaluation