Misalignment Potential

📋Page Status

Page Type:AI Transition ModelStyle Guide →Structured factor/scenario/parameter page

Last edited:2026-01-03 (4 weeks ago)

Critical Insights (2):

NeglectedHuman oversight quality degrades as AI capability increases - the very systems most in need of oversight become hardest to oversee.S:2.8I:4.3A:3.8
Quant.AI safety funding is ~1-2% of AI capabilities R&D spending at frontier labs - roughly $100-200M vs $10B+ annually.S:2.5I:4.0A:3.5

ATM Factor

Misalignment Potential

Model RoleRoot Factor (AI System)

Key ParametersAlignment Robustness, Interpretability Coverage, Human Oversight Quality

Primary OutcomeExistential Catastrophe

ATM Parameters

ATM Scenarios

Misalignment Potential measures the likelihood that AI systems will pursue goals other than what we intend. This aggregate combines the technical and organizational factors that determine whether advanced AI systems might behave in harmful ways despite our efforts.

Primary outcome affected: Existential Catastrophe ↑↑↑

When misalignment potential is high, catastrophic loss of control, accidents at scale, and goal divergence become more likely. Reducing this potential is the most direct lever for reducing existential and catastrophic AI risk.

Component Parameters

Loading diagram...

Internal Dynamics

These components interact:

Interpretability enables alignment verification: We can only confirm alignment if we understand model internals
Safety culture sustains investment: Without organizational commitment, safety research loses funding to capabilities
Oversight requires interpretability: Human overseers need tools to understand what systems are doing
Gap closure requires all components: No single factor is sufficient; safety capacity emerges from their combination

How This Affects Scenarios

↑ AI Takeover

Core driver of takeover risk - misaligned AI is the primary mechanism

↑ Long-term Lock-in

Misaligned AI could lock in bad values or power structures

↑ Human-Caused Catastrophe

Indirect effect through unsafe AI tools being misused

Existential Catastrophe — The outcome this primarily affects
Misuse Potential — The complementary factor for human-caused catastrophe

What Drives Misalignment Potential?

The three pillars of alignment assurance, their drivers, and key uncertainties.

Expand

Computing layout...

React Flow

Node Types

Root Causes

Derived

Direct Factors

Target

Arrow Strength

Strong

Medium

Weak

What links here

AI Capabilitiesai-transition-model-factoramplifies
Existential Catastropheai-transition-model-scenariodriver
AI Takeoverai-transition-model-scenariodriven-by

Misalignment Potential

Misalignment Potential

Component Parameters

Internal Dynamics

How This Affects Scenarios

Related Pages

What Drives Misalignment Potential?

What links here