Skip to content
Longterm Wiki
Back

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: OpenAI

This OpenAI blog post presents early empirical work on iterated amplification, a key scalable oversight proposal by Paul Christiano, relevant to anyone studying techniques for supervising AI systems beyond direct human evaluative capacity.

Metadata

Importance: 72/100blog postprimary source

Summary

OpenAI introduces iterated amplification, a scalable oversight technique where a human-AI system is progressively amplified through decomposition of complex tasks into simpler subproblems, enabling AI to learn goals that would otherwise be too difficult for humans to evaluate directly. The approach aims to maintain alignment even as AI capabilities scale beyond direct human oversight. It represents a core research direction for training AI systems on tasks where human feedback alone is insufficient.

Key Points

  • Iterated amplification decomposes hard tasks into easier subtasks, allowing humans to provide effective oversight by combining answers to simpler questions.
  • The technique aims to scale human supervision without sacrificing alignment, addressing the core challenge of evaluating superhuman AI performance.
  • It is closely related to Paul Christiano's theoretical work and serves as an empirical test of the amplification+distillation training loop.
  • The method alternates between amplification (human + AI assistant) and distillation (training a new model to match the amplified system) iteratively.
  • Iterated amplification is positioned as complementary to debate as a mechanism for scalable oversight of advanced AI systems.

Cited by 1 page

PageTypeQuality
Scalable OversightResearch Area68.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202612 KB
Learning complex goals with iterated amplification | OpenAI

 

 
 
 
 

 Jan
 FEB
 Mar
 

 
 

 
 14
 
 

 
 

 2025
 2026
 2027
 

 
 
 

 

 

 
 
success

 
fail

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 
 

 

 About this capture
 

 

 

 

 

 

 
COLLECTED BY

 

 

 
 
Collection: Save Page Now Outlinks

 

 

 

 

 
TIMESTAMPS

 

 

 

 

 

 

The Wayback Machine - http://web.archive.org/web/20260214232008/https://openai.com/index/learning-complex-goals-with-iterated-amplification/

 
Skip to main content

Log in

Switch to

ChatGPT(opens in a new window)

Sora(opens in a new window)

API Platform(opens in a new window)

Research

Safety

For Business

For Developers

ChatGPT

Sora

Codex

Stories

Company

News

Research

Back to main menu

Research Index

Research Overview

Research Residency

OpenAI for Science

Latest Advancements

GPT-5.2

GPT-5.1

Sora 2

GPT-5

OpenAI o3 and o4-mini

GPT-4.5

Safety

Back to main menu

Safety Approach

Security & Privacy

For Business

Back to main menu

Business Overview

Enterprise

Startups

Solutions

Learn

App Integrations

ChatGPT Pricing

API Pricing

Contact Sales

For Developers

Back to main menu

API Platform

API Pricing

Agents

Codex

Open Models

Community

(opens in a new window)

ChatGPT

Back to main menu

Explore ChatGPT

Business

Enterprise

Education

Pricing

Download

Sora

Codex

Stories

Company

Back to main menu

About Us

Our Charter

Foundation

Careers

Brand Guidelines

News

Log in

OpenAI

October 22, 2018
Publication

Learning complex goals with iterated amplification

Read paper

(opens in a new window)

Loading…

Share

We’re proposing an AI safety technique called iterated amplification that lets us specify complicated behaviors and goals that are beyond human scale, by demonstrating how to decompose a task into simpler sub-tasks, rather than by providing labeled data or a reward function. Although this idea is in its very early stages and we have only completed experiments on simple toy algorithmic domains, we’ve decided to present it in its preliminary state because we think it could prove to be a scalable approach to AI safety.

If we want to train an ML system to perform a task, we need a training signal—a way to evaluate how well it is doing in order to help it learn. For example, labels in supervised learning or rewards in reinforcement learning are training signals. The formalism of ML usually assumes a training signal is already present and focuses on learning from it, but in reality the training signal has to come from somewhere. If we don’t have a training signal we can’t learn the task, and if we have the wrong training signal, we can get unintended and sometimes⁠(opens in a new window) dangerous⁠ behavior⁠(opens in a new window). Thus, it would be valuable for both learning new tasks, and for AI safety, to improve our ability to generate training signals.

How do we currently generate training signals? Sometimes, the goal we want can be evalua

... (truncated, 12 KB total)
Resource ID: ca07d6bcd57e7027 | Stable ID: sid_QyoXbdo9sZ