Goodhart's Law (AI Alignment Forum Wiki)
blogCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Alignment Forum
A foundational wiki entry on a core AI alignment concept; the Garrabrant taxonomy it summarizes is widely cited in technical alignment literature on specification gaming and reward hacking.
Metadata
Summary
This wiki entry explains Goodhart's Law—when a proxy measure becomes the optimization target, it ceases to be a good proxy—and its critical relevance to AI alignment. It presents Scott Garrabrant's taxonomy of four Goodhart failure modes: regressional, causal, extremal, and adversarial, each describing a distinct mechanism by which proxy measures break down under optimization pressure.
Key Points
- •Goodhart's Law: optimizing a proxy measure causes it to stop accurately representing the underlying goal it was meant to capture.
- •AI alignment risk: a powerful AI optimizing even a good proxy for human values may cause that proxy to catastrophically break down.
- •Regressional Goodhart: optimization selects for noise in the proxy, not just the true underlying goal.
- •Extremal Goodhart: at extreme values, the proxy-goal correlation observed under normal conditions may no longer hold.
- •Adversarial Goodhart: optimization creates incentives for agents to game the proxy, actively destroying its correlation with the true goal.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Reward Hacking | Risk | 91.0 |
Cached Content Preview
Jan
FEB
Mar
15
2025
2026
2027
success
fail
About this capture
COLLECTED BY
Collection: Common Crawl
Web crawl data from Common Crawl.
TIMESTAMPS
The Wayback Machine - http://web.archive.org/web/20260215060649/https://www.alignmentforum.org/w/goodhart-s-law
x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Goodhart's Law — AI Alignment Forum
Goodhart's Law
Edited by Ruby, Vladimir_Nesov, et al. last updated 19th Mar 2023
Goodhart's Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.
Goodhart's Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for "the stuff that humans care about", it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart's law, the proxy will breakdown.
Goodhart Taxonomy
In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:
Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.
Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.
Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.
Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.
See Also
Groupthink, Information cascade, Affective death spiral
Adaptation executers, Superstimulus
Signaling, Filtered evidence
Cached thought
Modesty argument, Egalitarianism
Rationalization, Dark arts
Epistemic hygiene
Scoring rule
Add Posts
Subscribe
Discussion
4
Subscribe
Discussion
4
Posts tagged Goodhart's Law
Most Relevant
7
54Goodhart Taxonomy
Scott Garrabrant
8y
23
5
26Classifying specification problems as variants of Goodhart's Law
Vika
6y
5
4
18Specification gaming examples in AI
Vika
8y
8
5
62When is Goodhart catastrophic?
Drake Thomas, Thomas Kwa
3y
15
3
10Goodhart's Curse and Limitations on AI Alignment
Gordon Seidoh Worley
6y
0
3
24How does Gradient Descent Interact with Goodhart?
Q
Scott Garrabrant,
evhub
7y
Q
4
2
22Intr
... (truncated, 4 KB total)752f82912008599a | Stable ID: sid_eC07pi3XqC