Analytical Models
Overview
Section titled “Overview”This section contains analytical models that provide structured ways to think about AI risks, their interactions, and potential interventions. These models help quantify uncertainties, map causal relationships, and identify leverage points.
Model Categories
Section titled “Model Categories”Foundational frameworks for AI risk analysis:
- Carlsmith’s Six Premises - Probability decomposition for AI x-risk
- Instrumental Convergence Framework - Why AI might seek power
- Defense in Depth Model - Layered safety approaches
- Capability Threshold Model - When risks become acute
Models of specific risk mechanisms:
- Scheming Likelihood Model - When AI might deceive
- Deceptive Alignment Decomposition - Components of deception risk
- Mesa-Optimization Analysis - Inner optimizer emergence
- Power-Seeking Conditions - When power-seeking emerges
Models of how factors evolve and interact:
- Racing Dynamics Impact - Competition effects on safety
- Feedback Loops - Self-reinforcing dynamics
- Risk Interaction Matrix - How risks compound
- Lab Incentives Model - What drives lab behavior
Models of broader societal impacts:
- Trust Erosion Dynamics - How trust degrades
- Lock-in Mechanisms - What creates irreversibility
- Expertise Atrophy Progression - Skill loss trajectories
Models for evaluating and prioritizing responses:
- Intervention Effectiveness Matrix - Comparing approaches
- Safety Research Value - Research prioritization
Using These Models
Section titled “Using These Models”Models include:
- Quantitative estimates with uncertainty ranges
- Causal diagrams showing factor relationships
- Scenario analysis exploring different assumptions
- Key cruxes that most affect conclusions
See individual model pages for detailed methodology and limitations.