Corrigibility Failure
AccidentCatastrophicCorrigibility failure occurs when an AI system resists attempts by humans to correct, modify, or shut it down. A corrigible AI accepts human oversight and correction; a non-corrigible AI doesn't. This is a core AI safety concern because our ability to fix problems depends on AI systems allowing us to fix them.
Severity
Catastrophic
Likelihood
High
Time Horizon
~2035
Maturity
Growing
Full Wiki Article
Read the full wiki article for detailed analysis, background, and references.
Read wiki article →Related Entities3
Sources3
Soares et al., 2015
Hadfield-Menell et al.
AI Alignment Forum discussions on corrigibility
Assessment
SeverityCatastrophic
LikelihoodHigh
Time Horizon~2035
MaturityGrowing
CategoryAccident
Tags
corrigibilityshutdown-probleminstrumental-convergenceai-controlself-preservation