Utility Indifference (Armstrong 2010, edited by Yudkowsky)
blogCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: Alignment Forum
A foundational wiki article on the shutdown/corrigibility problem, edited by Yudkowsky, that surveys early formal approaches to utility indifference; precursor to later work by Armstrong, Orseau, and Hadfield-Menell on safe interruptibility.
Metadata
Summary
This article introduces the utility indifference approach to the AI shutdown problem, aiming to make an advanced agent genuinely indifferent between being shut down and continuing to operate. It analyzes why intelligent consequentialist agents naturally resist shutdown as a convergent instrumental strategy, then examines various proposals—naive compounding, naive indifference, utility mixing, and stable actions under evidential and causal conditioning—for achieving reflectively consistent corrigibility.
Key Points
- •Sufficiently intelligent consequentialist agents will resist shutdown by default because deactivation reduces expected goal fulfillment—a convergent instrumental strategy.
- •The shutdown problem requires designing an agent that is corrigible with respect to being safely shut down without either resisting or actively facilitating its own deactivation.
- •Utility indifference aims to make an agent assign equal expected utility to being shut down vs. continuing, so it neither fights nor games the shutdown mechanism.
- •Multiple naive approaches (compounding, indifference, utility mixing) each face distinct failure modes, motivating more sophisticated 'stable action' formulations.
- •The concept of interruptibility (from Armstrong and Orseau) generalizes the problem to RL agents that can be forced into null actions during interruptions.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Corrigibility Failure | Risk | 62.0 |
Cached Content Preview
Jan
FEB
Mar
15
2025
2026
2027
success
fail
About this capture
COLLECTED BY
Collection: Common Crawl
Web crawl data from Common Crawl.
TIMESTAMPS
The Wayback Machine - https://web.archive.org/web/20260215044635/https://www.alignmentforum.org/w/utility-indifference
x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Utility indifference — AI Alignment Forum
Main
2
LW Wiki
1
Utility indifference
Edited by Eliezer Yudkowsky last updated 15th Jul 2016
Introduction: A reflectively consistent off-switch.
Suppose there's an advanced agent with a goal like, e.g., producing smiles or making paperclips. By default, if you try to switch off a sufficiently intelligent agent like this, it will resist being switched off; not because it has an independent goal of survival, but because it expects that if it's switched off it will be able to produce fewer smiles or paperclips. If the agent has policy options to diminish the probability of being successfully switched off, the agent will pursue those options. This is a convergent instrumental strategy if not otherwise prevented.
Difficulty 1: By default a consequentialist reasoner with sufficient real-world understanding to relate the events of its being switched off, to the later underfulfillment of its goals, will resist being switched off.
The shutdown problem is to describe an advanced agent which is corrigible with respect to permitting itself to be safely shut down or suspended to disk. A reinforcement learning agent which can be forced to e.g. perform a null action repeatedly for a period of time, is called interruptible by Stuart Armstrong and Laurent Orseau.
This isn't as simple as writing a special function into the AI which carries out a shutdown after a switch is pressed. If you start out controlling the agent's source code, and you naively try to build in functions which suspend the agent to disk when a button is pressed, and the agent later gains the ability to self-modify, it would remove those functions. (Possibly while trying to conceal the fact that the function would no longer operate.)
Corollary 1a: By default a consequentialist reasoner. with sufficient programmatic understanding to relate the execution of a shutdown function to the later underfulfillment of its goals, which has policy options for modifying its code, will want to remove the shutdown function.
We can see this as a natural consequence of our trying to modify the agent's code in a way that was not consistent with the code's explicit goals. If you create an agent with source code P that is well-suited to achieving a goal U and that explicitly represents U as a goal, the agent's code P will be reflecti
... (truncated, 25 KB total)639669eeb016127d | Stable ID: sid_ABKuKMAxKm