Analytical Models
Overview
This section contains analytical models that provide structured ways to think about AI risks, their interactions, and potential interventions. These models help quantify uncertainties, map causal relationships, and identify leverage points.
Model Categories
Framework ModelsModelCarlsmith's Six-Premise ArgumentCarlsmith's framework decomposes AI existential risk into six conditional premises (timelines, incentives, alignment difficulty, power-seeking, disempowerment scaling, catastrophe), yielding ~5% ri...Quality: 65/100
Foundational frameworks for AI risk analysis:
- Carlsmith's Six PremisesModelCarlsmith's Six-Premise ArgumentCarlsmith's framework decomposes AI existential risk into six conditional premises (timelines, incentives, alignment difficulty, power-seeking, disempowerment scaling, catastrophe), yielding ~5% ri...Quality: 65/100 - Probability decomposition for AI x-risk
- Instrumental Convergence FrameworkModelInstrumental Convergence FrameworkQuantitative framework finding self-preservation converges in 95-99% of AI goal structures with 70-95% pursuit likelihood, while goal-content integrity shows 90-99% convergence creating detection c...Quality: 60/100 - Why AI might seek power
- Defense in Depth ModelModelAI Safety Defense in Depth ModelMathematical framework showing independent AI safety layers with 20-60% individual failure rates can achieve 1-3% combined failure, but deceptive alignment creates correlations (ρ=0.4-0.5) that inc...Quality: 69/100 - Layered safety approaches
- Capability Threshold ModelModelAI Capability Threshold ModelComprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons devel...Quality: 72/100 - When risks become acute
Risk ModelsModelScheming Likelihood AssessmentProbabilistic framework decomposing AI scheming risk into four multiplicative components (misalignment, situational awareness, instrumental rationality, feasibility), estimating current systems at ...Quality: 61/100
Models of specific risk mechanisms:
- Scheming Likelihood ModelModelScheming Likelihood AssessmentProbabilistic framework decomposing AI scheming risk into four multiplicative components (misalignment, situational awareness, instrumental rationality, feasibility), estimating current systems at ...Quality: 61/100 - When AI might deceive
- Deceptive Alignment DecompositionModelDeceptive Alignment Decomposition ModelDecomposes deceptive alignment probability into five multiplicative conditions (mesa-optimization, misalignment, awareness, deception, survival) yielding 0.5-24% overall risk with 5% central estima...Quality: 62/100 - Components of deception risk
- Mesa-Optimization AnalysisModelMesa-Optimization Risk AnalysisComprehensive risk framework for mesa-optimization estimating 10-70% emergence probability in frontier systems with 50-90% conditional misalignment likelihood, emphasizing quadratic capability-risk...Quality: 61/100 - Inner optimizer emergence
- Power-Seeking ConditionsModelPower-Seeking Emergence Conditions ModelFormal decomposition of power-seeking emergence into six quantified conditions, estimating current systems at 6.4% probability rising to 22% (2-4 years) and 36.5% (5-10 years). Provides concrete mi...Quality: 63/100 - When power-seeking emerges
Dynamics ModelsModelRacing Dynamics Impact ModelThis model quantifies how competitive pressure between AI labs reduces safety investment by 30-60% compared to coordinated scenarios and increases alignment failure probability by 2-5x through pris...Quality: 61/100
Models of how factors evolve and interact:
- Racing Dynamics ImpactModelRacing Dynamics Impact ModelThis model quantifies how competitive pressure between AI labs reduces safety investment by 30-60% compared to coordinated scenarios and increases alignment failure probability by 2-5x through pris...Quality: 61/100 - Competition effects on safety
- Feedback LoopsAnalysisAI Risk Feedback Loop & Cascade ModelSystem dynamics model showing AI capabilities growing at 2.5x/year vs safety at 1.2x/year, with positive feedback loops (investment→value, AI→automation) 2-3x stronger than negative loops (accident...Quality: 59/100 - Self-reinforcing dynamics
- Risk Interaction MatrixModelAI Risk Interaction MatrixSystematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing portfolio risk to be 2-3x higher than linear estima...Quality: 65/100 - How risks compound
- Lab Incentives Model - What drives lab behavior
Societal ModelsModelTrust Erosion Dynamics ModelAnalyzes how AI systems erode institutional trust through deepfakes, disinformation, and authentication collapse, finding trust erodes 3-10x faster than it builds, with US institutional trust at 18...Quality: 59/100
Models of broader societal impacts:
- Trust Erosion DynamicsModelTrust Erosion Dynamics ModelAnalyzes how AI systems erode institutional trust through deepfakes, disinformation, and authentication collapse, finding trust erodes 3-10x faster than it builds, with US institutional trust at 18...Quality: 59/100 - How trust degrades
- Lock-in Mechanisms - What creates irreversibilityRiskAI-Induced IrreversibilityComprehensive analysis of irreversibility in AI development, distinguishing between decisive catastrophic events and accumulative risks through gradual lock-in. Quantifies current trends (60-70% al...Quality: 64/100
- Expertise Atrophy ProgressionModelExpertise Atrophy Progression ModelFive-phase model tracking progression from AI augmentation to irreversible skill loss, finding humans decline to 50-70% baseline capability in Phase 3 (years 5-15) with reversibility becoming diffi...Quality: 52/100 - Skill loss trajectories
Intervention ModelsModelAI Safety Intervention Effectiveness MatrixQuantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to RLHF methods showing only 10-20% effectiveness aga...Quality: 73/100
Models for evaluating and prioritizing responses:
- Intervention Effectiveness MatrixModelAI Safety Intervention Effectiveness MatrixQuantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to RLHF methods showing only 10-20% effectiveness aga...Quality: 73/100 - Comparing approaches
- Safety Research ValueModelAI Safety Research Value ModelEconomic model analyzing AI safety research returns, recommending 3-10x funding increases from current ~\$500M/year to \$2-5B, with highest marginal returns (5-10x) in alignment theory and governan...Quality: 60/100 - Research prioritization
Using These Models
Models include:
- Quantitative estimates with uncertainty ranges
- Causal diagrams showing factor relationships
- Scenario analysis exploring different assumptions
- Key cruxes that most affect conclusions
See individual model pages for detailed methodology and limitations.