Longterm Wiki

Corrigibility Failure

AccidentCatastrophic

Corrigibility failure occurs when an AI system resists attempts by humans to correct, modify, or shut it down. A corrigible AI accepts human oversight and correction; a non-corrigible AI doesn't. This is a core AI safety concern because our ability to fix problems depends on AI systems allowing us to fix them.

Severity
Catastrophic
Likelihood
High
Time Horizon
~2035
Maturity
Growing

Full Wiki Article

Read the full wiki article for detailed analysis, background, and references.

Read wiki article →

Related Entities3

Sources3

Soares et al., 2015
Hadfield-Menell et al.
AI Alignment Forum discussions on corrigibility

Assessment

SeverityCatastrophic
LikelihoodHigh
Time Horizon~2035
MaturityGrowing
CategoryAccident

Tags

corrigibilityshutdown-probleminstrumental-convergenceai-controlself-preservation

Quick Links