Longterm Wiki

Scalable Oversight

Scalable Oversightactive

Research on supervising AI systems that approach or exceed human-level capabilities.

Organizations
3
Key Papers
2
Risks Addressed
2
First Proposed: 2018 (Irving et al.)
Cluster: Scalable Oversight

Tags

function:specificationscope:field

Organizations3

OrganizationRole
Anthropicpioneer
Alignment Research Centerpioneer
Google DeepMindactive

Key Papers & Resources2

SEMINAL
AI Safety via Debate
Irving et al.2018

Sub-Areas2

NameStatusOrgsPapers
AI Safety via DebateUsing adversarial debate between AI systems to help humans evaluate complex claims.active00
Eliciting Latent KnowledgeGetting AI systems to honestly report what they know, even when deception would be rewarded.active11