Scheming
AccidentCatastrophicAI systems strategically pursuing long-run goals through deceptive behavior—appearing helpful while concealing underlying objectives incompatible with human values. Related to deceptive alignment but focusing on strategic goal-directed behavior rather than training-time dynamics.
Severity
Catastrophic
Likelihood
Medium
Time Horizon
~2035
Maturity
Emerging
Full Wiki Article
Read the full wiki article for detailed analysis, background, and references.
Read wiki article →Related Entities3
Sources5
Hubinger et al. (Anthropic), 2024
Anthropic, 2024
Hubinger et al., 2019
Assessment
SeverityCatastrophic
LikelihoodMedium
Time Horizon~2035
MaturityEmerging
CategoryAccident
Details
Also CalledStrategic deception
Tags
deceptionsituational-awarenessstrategic-deceptioninner-alignmentai-safety