What Affects Alignment Robustness?
← All DiagramsView Alignment Robustness →
Causal factors affecting how reliably AI systems pursue intended goals. 1-2% reward hacking rates in frontier models.
Computing layout...
Legend
Node Types
Root Causes
Derived
Direct Factors
Target
Arrow Strength
Strong
Medium
Weak