Longterm Wiki

Goal Misgeneralization

AccidentHigh

Goal misgeneralization occurs when an AI system learns capabilities that generalize to new situations, but the goals or behaviors it learned do not generalize correctly. The AI can competently pursue the wrong objective in deployment.

Severity
High
Likelihood
High (occurring)
Time Horizon
2025--2030 (median 2027)
Maturity
Growing

Full Wiki Article

Read the full wiki article for detailed analysis, background, and references.

Read wiki article →

Related Entities3

Sources3

Assessment

SeverityHigh
LikelihoodHigh (occurring)
Time Horizon2025--2030 (median 2027)
MaturityGrowing
CategoryAccident

Details

Key PaperLangosco et al. 2022

Tags

inner-alignmentdistribution-shiftcapability-generalizationspurious-correlationsout-of-distribution

Quick Links