Goal Misgeneralization
AccidentHighGoal misgeneralization occurs when an AI system learns capabilities that generalize to new situations, but the goals or behaviors it learned do not generalize correctly. The AI can competently pursue the wrong objective in deployment.
Severity
High
Likelihood
High (occurring)
Time Horizon
2025--2030 (median 2027)
Maturity
Growing
Full Wiki Article
Read the full wiki article for detailed analysis, background, and references.
Read wiki article →Related Entities3
Sources3
Assessment
SeverityHigh
LikelihoodHigh (occurring)
Time Horizon2025--2030 (median 2027)
MaturityGrowing
CategoryAccident
Details
Key PaperLangosco et al. 2022
Tags
inner-alignmentdistribution-shiftcapability-generalizationspurious-correlationsout-of-distribution