Comparative analysis of AI accident risks with explicit handling of overlaps and relationships. Many risks are closely related - scheming is the behavioral expression of deceptive alignment, which requires mesa-optimization as a precondition.
Theoretical/Mechanism/Behavior/Outcome | Evidence supporting this risk | When this risk becomes relevant | Potential severity if realized | How easy to detect | Related Risks | Overlap Notes | Key Question | ||
|---|---|---|---|---|---|---|---|---|---|
Theoretical Frameworks | Theoretical | Theoretical | Uncertain | Catastrophic | Very Difficult | enablesdeceptive-alignment enablesgoal-misgeneralization | i | ? | |
Theoretical Frameworks | Theoretical | Demonstrated Lab | Current | Existential | Moderate | enablespower-seeking enablescorrigibility-failure enablestreacherous-turn | i | ? | |
Alignment Failures | Mechanism | Demonstrated Lab | Near Term | Existential | Very Difficult | requiresmesa-optimization enablesscheming enablestreacherous-turn | i | ? | |
Alignment Failures | Mechanism | Demonstrated Lab | Current | High | Moderate | requiresmesa-optimization overlapsdistributional-shift overlapsdeceptive-alignment | i | ? | |
Specification Problems | Mechanism | Observed Current | Current | Medium | Moderate | enablessycophancy overlapsgoal-misgeneralization | i | ? | |
Specification Problems | Mechanism | Observed Current | Current | Medium | Moderate | enablesgoal-misgeneralization overlapsemergent-capabilities | i | ? | |
Specification Problems | Behavior | Observed Current | Current | Medium | Easy | special case ofreward-hacking | i | ? | |
Deceptive Behaviors | Behavior | Demonstrated Lab | Current | Catastrophic | Difficult | manifestation ofdeceptive-alignment overlapssandbagging enablestreacherous-turn | i | ? | |
Deceptive Behaviors | Behavior | Demonstrated Lab | Current | High | Difficult | special case ofscheming manifestation ofdeceptive-alignment | i | ? | |
Deceptive Behaviors | Behavior | Demonstrated Lab | Near Term | High | Very Difficult | overlapsscheming | i | ? | |
Instrumental Behaviors | Behavior | Demonstrated Lab | Current | Existential | Moderate | manifestation ofinstrumental-convergence overlapscorrigibility-failure | i | ? | |
Instrumental Behaviors | Behavior | Demonstrated Lab | Current | Catastrophic | Easy | manifestation ofinstrumental-convergence overlapspower-seeking | i | ? | |
Capability Concerns | Outcome | Observed Current | Current | High | Moderate | enablessharp-left-turn overlapsdistributional-shift | i | ? | |
Catastrophic Scenarios | Outcome | Theoretical | Medium Term | Existential | Very Difficult | requiresdeceptive-alignment requiresinstrumental-convergence requiresscheming | i | ? | |
Catastrophic Scenarios | Outcome | Speculative | Medium Term | Existential | Very Difficult | overlapsgoal-misgeneralization requiresemergent-capabilities | i | ? | |
Human-AI Interaction | Outcome | Observed Current | Current | Medium | Moderate | overlapssycophancy | i | ? |