Christiano, P. (2017). "Corrigibility."
webA foundational blog post by Paul Christiano that broadens the concept of corrigibility beyond shutdown compliance and links it to his act-based agent framework, providing theoretical grounding for why corrigibility may be achievable and self-reinforcing.
Metadata
Importance: 72/100blog postprimary source
Summary
Paul Christiano argues that a benign act-based AI agent will be robustly corrigible if designed correctly, and that corrigibility forms a broad basin of attraction toward acceptable outcomes rather than a narrow target. The post frames corrigibility broadly—encompassing error correction, human oversight, preference clarification, and resource control—and explains why this view underlies Christiano's overall optimism about AI alignment.
Key Points
- •Corrigibility is defined broadly to include: error correction, transparency, preference clarification, resource control, and self-perpetuating safe behavior.
- •A benign act-based agent will be robustly corrigible if we want it to be, due to its deference to human preferences.
- •Corrigibility is not a narrow target but a wide basin of attraction—a sufficiently corrigible agent tends to become more corrigible over time.
- •This framing implies alignment researchers should focus on avoiding catastrophic failures that push systems out of the corrigibility basin rather than achieving perfect value specification.
- •The post connects corrigibility to Christiano's broader act-based agent framework and his optimism about practical alignment approaches.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Corrigibility | Research Area | 59.0 |
Cached Content Preview
HTTP 200Fetched Apr 7, 20266 KB
-->
Ask the publishers to restore access to 500,000+ books.
Hamburger icon
An icon used to represent a menu that can be
toggled by interacting with this icon.
Internet Archive logo
A line drawing of the Internet Archive headquarters
building façade.
Web icon
An illustration of a computer
application window
Wayback Machine
Texts icon
An illustration of an open book.
Texts
Video icon
An illustration of two cells of a film
strip.
Video
Audio icon
An illustration of an audio speaker.
Audio
Software icon
An illustration of a 3.5" floppy
disk.
Software
Images icon
An illustration of two photographs.
Images
Donate icon
An illustration of a heart shape
Donate
Ellipses icon
An illustration of text ellipses.
More
Donate icon
An illustration of a heart shape
"Donate to the archive"
User icon
An illustration of a person's head and chest.
Sign up
|
Log in
Upload icon
An illustration of a horizontal line over an up
pointing arrow.
Upload
Search icon
An illustration of a magnifying glass.
Search the Archive
Search icon
An illustration of a magnifying glass.
Internet Archive Audio
Live Music
Archive
Librivox
Free Audio
Featured
All Audio
Grateful Dead
Netlabels
Old Time Radio
78 RPMs
and Cylinder Recordings
Top
Audio Books
& Poetry
Computers,
Technology and Science
Music, Arts
& Culture
News &
Public Affairs
Spirituality
& Religion
Podcasts
Radio News
Archive
Images
Metropolitan Museum
Cleveland
Museum of Art
Featured
All Images
Flickr Commons
Occupy Wall
Street Flickr
Cover Art
USGS Maps
Top
NASA Images
Solar System
Collection
Ames Research
Center
Software
Internet
Arcade
Console Living Room
Featured
All Software
Old School
Emulation
MS-DOS Games
Historical
Software
Classic PC
Games
Software
Library
Top
Kodi
Archive and Support File
Vintage
Software
APK
MS-DOS
CD-ROM
Software
CD-ROM
Software Library
Software Sites
Tucows
Software Library
Shareware
CD-ROMs
Software
Capsules Compilation
CD-ROM Images
ZX Spectrum
DOOM Level CD
... (truncated, 6 KB total)Resource ID:
41ce82b75cb1cac3 | Stable ID: sid_bAnJJUsKt3