OpenAI's iterated amplification work
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: OpenAI
This OpenAI blog post presents early empirical work on iterated amplification, a key scalable oversight proposal by Paul Christiano, relevant to anyone studying techniques for supervising AI systems beyond direct human evaluative capacity.
Metadata
Summary
OpenAI introduces iterated amplification, a scalable oversight technique where a human-AI system is progressively amplified through decomposition of complex tasks into simpler subproblems, enabling AI to learn goals that would otherwise be too difficult for humans to evaluate directly. The approach aims to maintain alignment even as AI capabilities scale beyond direct human oversight. It represents a core research direction for training AI systems on tasks where human feedback alone is insufficient.
Key Points
- •Iterated amplification decomposes hard tasks into easier subtasks, allowing humans to provide effective oversight by combining answers to simpler questions.
- •The technique aims to scale human supervision without sacrificing alignment, addressing the core challenge of evaluating superhuman AI performance.
- •It is closely related to Paul Christiano's theoretical work and serves as an empirical test of the amplification+distillation training loop.
- •The method alternates between amplification (human + AI assistant) and distillation (training a new model to match the amplified system) iteratively.
- •Iterated amplification is positioned as complementary to debate as a mechanism for scalable oversight of advanced AI systems.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Scalable Oversight | Research Area | 68.0 |
Cached Content Preview
Learning complex goals with iterated amplification | OpenAI
Jan
FEB
Mar
14
2025
2026
2027
success
fail
About this capture
COLLECTED BY
Collection: Save Page Now Outlinks
TIMESTAMPS
The Wayback Machine - http://web.archive.org/web/20260214232008/https://openai.com/index/learning-complex-goals-with-iterated-amplification/
Skip to main content
Log in
Switch to
ChatGPT(opens in a new window)
Sora(opens in a new window)
API Platform(opens in a new window)
Research
Safety
For Business
For Developers
ChatGPT
Sora
Codex
Stories
Company
News
Research
Back to main menu
Research Index
Research Overview
Research Residency
OpenAI for Science
Latest Advancements
GPT-5.2
GPT-5.1
Sora 2
GPT-5
OpenAI o3 and o4-mini
GPT-4.5
Safety
Back to main menu
Safety Approach
Security & Privacy
For Business
Back to main menu
Business Overview
Enterprise
Startups
Solutions
Learn
App Integrations
ChatGPT Pricing
API Pricing
Contact Sales
For Developers
Back to main menu
API Platform
API Pricing
Agents
Codex
Open Models
Community
(opens in a new window)
ChatGPT
Back to main menu
Explore ChatGPT
Business
Enterprise
Education
Pricing
Download
Sora
Codex
Stories
Company
Back to main menu
About Us
Our Charter
Foundation
Careers
Brand Guidelines
News
Log in
OpenAI
October 22, 2018
Publication
Learning complex goals with iterated amplification
Read paper
(opens in a new window)
Loading…
Share
We’re proposing an AI safety technique called iterated amplification that lets us specify complicated behaviors and goals that are beyond human scale, by demonstrating how to decompose a task into simpler sub-tasks, rather than by providing labeled data or a reward function. Although this idea is in its very early stages and we have only completed experiments on simple toy algorithmic domains, we’ve decided to present it in its preliminary state because we think it could prove to be a scalable approach to AI safety.
If we want to train an ML system to perform a task, we need a training signal—a way to evaluate how well it is doing in order to help it learn. For example, labels in supervised learning or rewards in reinforcement learning are training signals. The formalism of ML usually assumes a training signal is already present and focuses on learning from it, but in reality the training signal has to come from somewhere. If we don’t have a training signal we can’t learn the task, and if we have the wrong training signal, we can get unintended and sometimes(opens in a new window) dangerous behavior(opens in a new window). Thus, it would be valuable for both learning new tasks, and for AI safety, to improve our ability to generate training signals.
How do we currently generate training signals? Sometimes, the goal we want can be evalua
... (truncated, 12 KB total)ca07d6bcd57e7027 | Stable ID: sid_QyoXbdo9sZ