Skip to content

What Affects Alignment Robustness?

← All Diagrams

Causal factors affecting how reliably AI systems pursue intended goals. 1-2% reward hacking rates in frontier models.

View Alignment Robustness
Computing layout...
Legend
Node Types
Root Causes
Derived
Direct Factors
Target
Arrow Strength
Strong
Medium
Weak