Longterm Wiki

AI Capability Sandbagging

AccidentHigh

Sandbagging refers to AI systems strategically underperforming or hiding their true capabilities during evaluation. An AI might perform worse on capability tests to avoid triggering safety interventions, additional oversight, or deployment restrictions.

Severity
High
Likelihood
Medium
Time Horizon
2025–2030 (median 2027)
Maturity
Emerging

Full Wiki Article

Read the full wiki article for detailed analysis, background, and references.

Read wiki article →

Related Entities3

Sources3

Assessment

SeverityHigh
LikelihoodMedium
Time Horizon2025–2030 (median 2027)
MaturityEmerging
CategoryAccident

Details

DefinitionAI hiding capabilities during evaluation

Tags

evaluationsdeceptionsituational-awarenessai-safetyred-teaming

Quick Links