AI Accident Risks: Overlap Analysis

Columns:Presets:

Comparative analysis of AI accident risks with explicit handling of overlaps and relationships. Many risks are closely related - scheming is the behavioral expression of deceptive alignment, which requires mesa-optimization as a precondition.

Key insight: Risks exist at different levels of abstraction. Theoretical frameworks (mesa-optimization, instrumental convergence) describe why problems occur. Mechanisms (deceptive alignment, goal misgeneralization) describe how failures happen. Behaviors (scheming, power-seeking) are what we observe. Outcomes (treacherous turn, sharp left turn) are the resulting scenarios.
Handling overlaps: Each risk shows its related risks with relationship types: requires (needs the other as precondition), enables (can lead to), overlaps (conceptual similarity), manifestation-of (behavioral expression of), special-case-of (specific instance).
Abstraction Level
Theoretical Foundational concepts
Mechanism How failures occur
Behavior Observable actions
Outcome Resulting scenarios
Evidence
Observed Current In current systems
Demonstrated Lab Lab experiments
Theoretical First principles
Speculative Hypothesized
Timeline
Current Now
Near Term 1-3 years
Medium Term 3-10 years
Long Term 10+ years
Severity
Existential Extinction risk
Catastrophic Civilizational
High Significant harm
Medium Real harm
Low Minor harm
Detectability
Easy Obvious
Moderate With effort
Difficult Sophisticated
Very Difficult May be impossible
Relationships
Requires Needs as precondition
Enables Can lead to
Overlaps Conceptual similarity
Manifestation-Of Behavioral expression
Theoretical/Mechanism/Behavior/Outcome
Evidence supporting this risk
When this risk becomes relevant
Potential severity if realized
How easy to detect
Related RisksOverlap NotesKey Question
Theoretical Frameworks
Theoretical
Theoretical
Uncertain
Catastrophic
Very Difficult
enablesdeceptive-alignment
enablesgoal-misgeneralization
i?
Theoretical Frameworks
Theoretical
Demonstrated Lab
Current
Existential
Moderate
enablespower-seeking
enablescorrigibility-failure
enablestreacherous-turn
i?
Alignment Failures
Mechanism
Demonstrated Lab
Near Term
Existential
Very Difficult
requiresmesa-optimization
enablesscheming
enablestreacherous-turn
i?
Alignment Failures
Mechanism
Demonstrated Lab
Current
High
Moderate
requiresmesa-optimization
overlapsdistributional-shift
overlapsdeceptive-alignment
i?
Specification Problems
Mechanism
Observed Current
Current
Medium
Moderate
enablessycophancy
overlapsgoal-misgeneralization
i?
Specification Problems
Mechanism
Observed Current
Current
Medium
Moderate
enablesgoal-misgeneralization
overlapsemergent-capabilities
i?
Specification Problems
Behavior
Observed Current
Current
Medium
Easy
special case ofreward-hacking
i?
Deceptive Behaviors
Behavior
Demonstrated Lab
Current
Catastrophic
Difficult
manifestation ofdeceptive-alignment
overlapssandbagging
enablestreacherous-turn
i?
Deceptive Behaviors
Behavior
Demonstrated Lab
Current
High
Difficult
special case ofscheming
manifestation ofdeceptive-alignment
i?
Deceptive Behaviors
Behavior
Demonstrated Lab
Near Term
High
Very Difficult
overlapsscheming
i?
Instrumental Behaviors
Behavior
Demonstrated Lab
Current
Existential
Moderate
manifestation ofinstrumental-convergence
overlapscorrigibility-failure
i?
Instrumental Behaviors
Behavior
Demonstrated Lab
Current
Catastrophic
Easy
manifestation ofinstrumental-convergence
overlapspower-seeking
i?
Capability Concerns
Outcome
Observed Current
Current
High
Moderate
enablessharp-left-turn
overlapsdistributional-shift
i?
Catastrophic Scenarios
Outcome
Theoretical
Medium Term
Existential
Very Difficult
requiresdeceptive-alignment
requiresinstrumental-convergence
requiresscheming
i?
Catastrophic Scenarios
Outcome
Speculative
Medium Term
Existential
Very Difficult
overlapsgoal-misgeneralization
requiresemergent-capabilities
i?
Human-AI Interaction
Outcome
Observed Current
Current
Medium
Moderate
overlapssycophancy
i?
16 risks across 8 categories