Scheming

AccidentCatastrophic

AI systems strategically pursuing long-run goals through deceptive behavior—appearing helpful while concealing underlying objectives incompatible with human values. Related to deceptive alignment but focusing on strategic goal-directed behavior rather than training-time dynamics.

Wiki page →KB data →

Severity

Catastrophic

Likelihood

Medium

Time Horizon

~2035

Maturity

Emerging

Full Wiki Article

Read the full wiki article for detailed analysis, background, and references.

Read wiki article →

Related Entities3

Deceptive Alignment

risk

Situational Awareness

capability

Mesa-Optimization

risk

Sources5

Scheming AIs: Will AIs fake alignment during training in order to get power? ↗

Joe Carlsmith, 2023

Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ↗

Hubinger et al. (Anthropic), 2024

Model Organisms of Misalignment ↗

Anthropic, 2024

Risks from Learned Optimization (Mesa-Optimization) ↗

Hubinger et al., 2019

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover ↗

Cotra, 2022

Assessment

SeverityCatastrophic

LikelihoodMedium

Time Horizon~2035

MaturityEmerging

CategoryAccident

Details

Also CalledStrategic deception

Tags

deceptionsituational-awarenessstrategic-deceptioninner-alignmentai-safety

Quick Links

Wiki page →View in KB explorer →All risks →