Longterm Wiki

Goal Misgeneralization Research

Research into how learned goals fail to generalize correctly to new situations, a core alignment problem where AI systems pursue proxy objectives that diverge from intended goals when deployed outside their training distribution.

Related

Related Pages

Top Related Pages

Analysis

Goal Misgeneralization Probability Model

Approaches

Scheming & Deception Detection