LLM summaryLLM summaryBasic text summary used in search results, entity link tooltips, info boxes, and related page cards.crux content improve <id>ScheduleScheduleHow often the page should be refreshed. Drives the overdue tracking system.Set updateFrequency in frontmatterEntityEntityYAML entity definition with type, description, and related entries.Add entity YAML in data/entities/Edit historyEdit historyTracked changes from improve pipeline runs and manual edits.crux edit-log view <id>OverviewOverviewA ## Overview heading section that orients readers. Helps with search and AI summaries.
Tables0/ ~1TablesData tables for structured comparisons and reference material.Add data tables to the pageDiagrams0DiagramsVisual content — Mermaid diagrams, charts, or Squiggle estimate models.Add Mermaid diagrams or Squiggle modelsInt. links21/ ~3Int. linksLinks to other wiki pages. More internal links = better graph connectivity.Ext. links0/ ~1Ext. linksLinks to external websites, papers, and resources outside the wiki.Add links to external sourcesFootnotes0/ ~2FootnotesFootnote citations [^N] with source references at the bottom of the page.Add [^N] footnote citationsReferences0/ ~1ReferencesCurated external resources linked via <R> components or cited_by in YAML.Add <R> resource linksQuotes0QuotesSupporting quotes extracted from cited sources to back up page claims.crux citations extract-quotes <id>Accuracy0AccuracyCitations verified against their sources for factual accuracy.crux citations verify <id>
Issues1
StructureNo tables or diagrams - consider adding visual content
Analytical Models
Overview
This section contains analytical models that provide structured ways to think about AI risks, their interactions, and potential interventions. These models help quantify uncertainties, map causal relationships, and identify leverage points.
Model Categories
Framework ModelsAnalysisCarlsmith's Six-Premise ArgumentCarlsmith's framework decomposes AI existential risk into six conditional premises (timelines, incentives, alignment difficulty, power-seeking, disempowerment scaling, catastrophe), yielding ~5% ri...Quality: 65/100
Foundational frameworks for AI risk analysis:
Carlsmith's Six PremisesAnalysisCarlsmith's Six-Premise ArgumentCarlsmith's framework decomposes AI existential risk into six conditional premises (timelines, incentives, alignment difficulty, power-seeking, disempowerment scaling, catastrophe), yielding ~5% ri...Quality: 65/100 - Probability decomposition for AI x-risk
Instrumental Convergence FrameworkAnalysisInstrumental Convergence FrameworkQuantitative framework finding self-preservation converges in 95-99% of AI goal structures with 70-95% pursuit likelihood, while goal-content integrity shows 90-99% convergence creating detection c...Quality: 60/100 - Why AI might seek power
Defense in Depth ModelAnalysisAI Safety Defense in Depth ModelMathematical framework showing independent AI safety layers with 20-60% individual failure rates can achieve 1-3% combined failure, but deceptive alignment creates correlations (ρ=0.4-0.5) that inc...Quality: 69/100 - Layered safety approaches
Capability Threshold ModelAnalysisAI Capability Threshold ModelComprehensive framework mapping AI capabilities across 5 dimensions to specific risk thresholds, finding authentication collapse/mass persuasion risks at 70-85% likelihood by 2027, bioweapons devel...Quality: 72/100 - When risks become acute
Risk ModelsAnalysisScheming Likelihood AssessmentProbabilistic framework decomposing AI scheming risk into four multiplicative components (misalignment, situational awareness, instrumental rationality, feasibility), estimating current systems at ...Quality: 61/100
Models of specific risk mechanisms:
Scheming Likelihood ModelAnalysisScheming Likelihood AssessmentProbabilistic framework decomposing AI scheming risk into four multiplicative components (misalignment, situational awareness, instrumental rationality, feasibility), estimating current systems at ...Quality: 61/100 - When AI might deceive
Deceptive Alignment DecompositionAnalysisDeceptive Alignment Decomposition ModelDecomposes deceptive alignment probability into five multiplicative conditions (mesa-optimization, misalignment, awareness, deception, survival) yielding 0.5-24% overall risk with 5% central estima...Quality: 62/100 - Components of deception risk
Mesa-Optimization AnalysisAnalysisMesa-Optimization Risk AnalysisComprehensive risk framework for mesa-optimization estimating 10-70% emergence probability in frontier systems with 50-90% conditional misalignment likelihood, emphasizing quadratic capability-risk...Quality: 61/100 - Inner optimizer emergence
Power-Seeking ConditionsAnalysisPower-Seeking Emergence Conditions ModelFormal decomposition of power-seeking emergence into six quantified conditions, estimating current systems at 6.4% probability rising to 22% (2-4 years) and 36.5% (5-10 years). Provides concrete mi...Quality: 63/100 - When power-seeking emerges
Dynamics ModelsAnalysisRacing Dynamics Impact ModelThis model quantifies how competitive pressure between AI labs reduces safety investment by 30-60% compared to coordinated scenarios and increases alignment failure probability by 2-5x through pris...Quality: 61/100
Models of how factors evolve and interact:
Racing Dynamics ImpactAnalysisRacing Dynamics Impact ModelThis model quantifies how competitive pressure between AI labs reduces safety investment by 30-60% compared to coordinated scenarios and increases alignment failure probability by 2-5x through pris...Quality: 61/100 - Competition effects on safety
Feedback LoopsAnalysisAI Risk Feedback Loop & Cascade ModelSystem dynamics model showing AI capabilities growing at 2.5x/year vs safety at 1.2x/year, with positive feedback loops (investment→value, AI→automation) 2-3x stronger than negative loops (accident...Quality: 59/100 - Self-reinforcing dynamics
Risk Interaction MatrixAnalysisAI Risk Interaction MatrixSystematic framework for quantifying AI risk interactions, finding 15-25% of risk pairs strongly interact with coefficients +0.2 to +2.0, causing portfolio risk to be 2-3x higher than linear estima...Quality: 65/100 - How risks compound
Lab Incentives Model - What drives lab behavior
Societal ModelsAnalysisTrust Erosion Dynamics ModelAnalyzes how AI systems erode institutional trust through deepfakes, disinformation, and authentication collapse, finding trust erodes 3-10x faster than it builds, with US institutional trust at 18...Quality: 59/100
Models of broader societal impacts:
Trust Erosion DynamicsAnalysisTrust Erosion Dynamics ModelAnalyzes how AI systems erode institutional trust through deepfakes, disinformation, and authentication collapse, finding trust erodes 3-10x faster than it builds, with US institutional trust at 18...Quality: 59/100 - How trust degrades
Lock-in Mechanisms - What creates irreversibilityRiskAI-Induced IrreversibilityComprehensive analysis of irreversibility in AI development, distinguishing between decisive catastrophic events and accumulative risks through gradual lock-in. Quantifies current trends (60-70% al...Quality: 64/100
Expertise Atrophy ProgressionAnalysisExpertise Atrophy Progression ModelFive-phase model tracking progression from AI augmentation to irreversible skill loss, finding humans decline to 50-70% baseline capability in Phase 3 (years 5-15) with reversibility becoming diffi...Quality: 52/100 - Skill loss trajectories
Intervention ModelsAnalysisAI Safety Intervention Effectiveness MatrixQuantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to RLHF methods showing only 10-20% effectiveness aga...Quality: 73/100
Models for evaluating and prioritizing responses:
Intervention Effectiveness MatrixAnalysisAI Safety Intervention Effectiveness MatrixQuantitative analysis mapping 15+ AI safety interventions to specific risks reveals critical misallocation: 40% of 2024 funding (\$400M+) flows to RLHF methods showing only 10-20% effectiveness aga...Quality: 73/100 - Comparing approaches
Safety Research ValueAnalysisAI Safety Research Value ModelEconomic model analyzing AI safety research returns, recommending 3-10x funding increases from current ~\$500M/year to \$2-5B, with highest marginal returns (5-10x) in alignment theory and governan...Quality: 60/100 - Research prioritization
Using These Models
Models include:
Quantitative estimates with uncertainty ranges
Causal diagrams showing factor relationships
Scenario analysis exploring different assumptions
Key cruxes that most affect conclusions
See individual model pages for detailed methodology and limitations.
AI-Induced IrreversibilityRiskAI-Induced IrreversibilityComprehensive analysis of irreversibility in AI development, distinguishing between decisive catastrophic events and accumulative risks through gradual lock-in. Quantifies current trends (60-70% al...Quality: 64/100
Analysis
Deceptive Alignment Decomposition ModelAnalysisDeceptive Alignment Decomposition ModelDecomposes deceptive alignment probability into five multiplicative conditions (mesa-optimization, misalignment, awareness, deception, survival) yielding 0.5-24% overall risk with 5% central estima...Quality: 62/100Power-Seeking Emergence Conditions ModelAnalysisPower-Seeking Emergence Conditions ModelFormal decomposition of power-seeking emergence into six quantified conditions, estimating current systems at 6.4% probability rising to 22% (2-4 years) and 36.5% (5-10 years). Provides concrete mi...Quality: 63/100Scheming Likelihood AssessmentAnalysisScheming Likelihood AssessmentProbabilistic framework decomposing AI scheming risk into four multiplicative components (misalignment, situational awareness, instrumental rationality, feasibility), estimating current systems at ...Quality: 61/100Racing Dynamics Impact ModelAnalysisRacing Dynamics Impact ModelThis model quantifies how competitive pressure between AI labs reduces safety investment by 30-60% compared to coordinated scenarios and increases alignment failure probability by 2-5x through pris...Quality: 61/100Carlsmith's Six-Premise ArgumentAnalysisCarlsmith's Six-Premise ArgumentCarlsmith's framework decomposes AI existential risk into six conditional premises (timelines, incentives, alignment difficulty, power-seeking, disempowerment scaling, catastrophe), yielding ~5% ri...Quality: 65/100Instrumental Convergence FrameworkAnalysisInstrumental Convergence FrameworkQuantitative framework finding self-preservation converges in 95-99% of AI goal structures with 70-95% pursuit likelihood, while goal-content integrity shows 90-99% convergence creating detection c...Quality: 60/100