Iterated Distillation and Amplification
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Data Status
Abstract
Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems. Iterated Amplification is closely related to Expert Iteration (Anthony et al., 2017; Silver et al., 2017), except that it uses no external reward function. We present results in algorithmic environments, showing that Iterated Amplification can efficiently learn complex behaviors.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| AI Accident Risk Cruxes | Crux | 67.0 |
| Paul Christiano | Person | 39.0 |
Cached Content Preview
# Supervising strong learners by amplifying weak experts
Paul Christiano
OpenAI
paul@openai.com
&Buck Shlegeris
bshlegeris@gmail.com
&Dario Amodei
OpenAI
damodei@openai.com
Work done while at OpenAI.
###### Abstract
Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems. Iterated Amplification is closely related to Expert Iteration (Anthony et al., [2017](https://ar5iv.labs.arxiv.org/html/1810.08575#bib.bib4 ""); Silver et al., [2017b](https://ar5iv.labs.arxiv.org/html/1810.08575#bib.bib22 "")), except that it uses no external reward function. We present results in algorithmic environments, showing that Iterated Amplification can efficiently learn complex behaviors.
## 1 Introduction
If we want to train an ML system to perform a task, we need to be able to evaluate how well it is doing. Whether our training signal takes the form of labels, rewards, or something else entirely, we need some way to generate that signal.
If our goal can be evaluated automatically,
such as winning a game of Go,
or if we have an algorithm that can generate examples of correct behavior, then generating a training signal is trivial.
In these cases we might say that there is an “algorithmic” training signal.
Unfortunately, most useful tasks don’t have an algorithmic training signal.
So in current applications of machine learning, humans often provide the training signal.
This can be done by having a human demonstrate the task, for example labeling an image or teleoperating a robot,
or by learning a reward function from human judgments.
For these classes of tasks, we could say there is a “human” training signal.
However, there are harder tasks for which we can’t compute demonstrations or rewards even with human assistance,
and for which we currently have no clear method to get a meaningful training signal.
Consider making economic policy decisions,
advancing the scientific frontier,
or managing the security of a large network of computers.
Some of these tasks are “beyond human scale” – a single human
can’t perform them and can’t make sense of their massive observation space well enough to judge the behavior of an agent.
It may be possible for a human to judge performance in the very long run
(for example, by looking at economic growth over several years),
but such long-term feedback is very slow to learn from.
We currently have no way to learn how to perform such tasks much better than a human.
The overall situation is depicted in Table 1, which shows six different combinations o
... (truncated, 59 KB total)f0980ca7010a4a44 | Stable ID: YmI4MTA3Mj