Longterm Wiki

Agent Foundations

Scalable Oversightactive

Theoretical foundations for reasoning about goal-directed AI systems (MIRI-style research).

First Proposed: 2014 (MIRI)
Cluster: Scalable Oversight

Tags

function:specificationscope:field

Sub-Areas1

NameStatusOrgsPapers
Natural AbstractionsHypothesis that natural abstractions generalize across observers, providing a basis for alignment.active00