AI Alignment: Why It's Hard, and Where to Start
webAuthor
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: MIRI
A foundational introductory talk by Eliezer Yudkowsky (MIRI) presenting the core framing of AI alignment as a technical problem, suitable as an entry point for researchers new to the field.
Metadata
Summary
Eliezer Yudkowsky's 2016 Stanford talk introducing the AI alignment problem, covering why coherent advanced AI systems imply utility functions, key technical subproblems (low-impact agents, corrigibility, stable goals under self-modification), and why alignment is both necessary and difficult. The talk also discusses lessons from analogous engineering fields and provides entry points for researchers new to the field.
Key Points
- •Coherent decision-making agents implicitly have utility functions, making goal specification and alignment a fundamental technical challenge.
- •Key alignment subproblems include low-impact agents, interruptibility/corrigibility (suspend buttons), and maintaining stable goals through self-modification.
- •Alignment is hard because small misspecifications in goals can lead to catastrophic outcomes at high capability levels.
- •Lessons from NASA and cryptography suggest that safety-critical systems require rigorous theoretical foundations before deployment, not just empirical iteration.
- •Provides an accessible overview and reading list for researchers looking to enter the AI alignment field.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Doomer Worldview | Concept | 38.0 |
Cached Content Preview
AI Alignment: Why It's Hard, and Where to Start - Machine Intelligence Research Institute
Skip to content
AI Alignment: Why It’s Hard, and Where to Start
December 28, 2016
Eliezer Yudkowsky
Back in May, I gave a talk at Stanford University for the Symbolic Systems Distinguished Speaker series, titled “ The AI Alignment Problem: Why It’s Hard, And Where To Start .” The video for this talk is now available on Youtube:
We have an approximately complete transcript of the talk and Q&A session here , slides here , and notes and references here . You may also be interested in a shorter version of this talk I gave at NYU in October, “ Fundamental Difficulties in Aligning Advanced AI .”
In the talk, I introduce some open technical problems in AI alignment and discuss the bigger picture into which they fit, as well as what it’s like to work in this relatively new field. Below, I’ve provided an abridged transcript of the talk, with some accompanying slides.
Talk outline:
1. Agents and their utility functions
1.1. Coherent decisions imply a utility function
1.2. Filling a cauldron
2. Some AI alignment subproblems
2.1. Low-impact agents
2.2. Agents with suspend buttons
2.3. Stable goals in self-modification
3. Why expect difficulty?
3.1. Why is alignment necessary?
3.2. Why is alignment hard?
3.3. Lessons from NASA and cryptography
4. Where we are now
4.1. Recent topics
4.2. Older work and basics
4.3. Where to start
Agents and their utility functions
In this talk, I’m going to try to answer the frequently asked question, “Just what is it that you do all day long?” We are concerned with the theory of artificial intelligences that are advanced beyond the present day, and that make sufficiently high-quality decisions in the service of whatever goals they may have been programmed with to be objects of concern .
Coherent decisions imply a utility function
The classic initial stab at this was taken by Isaac Asimov with the Three Laws of Robotics, the first of which is: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”
And as Peter Norvig observed, the other laws don’t matter—because there will always be some tiny possibility that a human being could come to harm.
Artificial Intelligence: A Modern Approach has a final chapter that asks, “Well, what if we succeed? What if the AI project actually works?” and observes, “We don’t want our robots to prevent a human from crossing the street because of the non-zero chance of harm.”
To begin with, I’d like to explain the truly basic reason why the three laws aren’t even on the table—and that is because they’re not
... (truncated, 55 KB total)372cee55e4b03787 | Stable ID: sid_pc9U3XwMlH