Overview

Navigation

Overview

Updated 2026-02-20History Data

Page StatusResponse

Edited 2 months ago93 words

Content3/13

Change History1

Clarify overview pages with new entity type2 months ago

Added `overview` as a proper entity type throughout the system, migrated all 36 overview pages to `entityType: overview`, built overview-specific InfoBox rendering with child page links, created an OverviewBanner component, and added a knowledge-base-overview page template to Crux.

Issues2

StaleLast edited 67 days ago - may need review

StructureNo tables or diagrams - consider adding visual content

Theoretical Foundations (Overview)

Theoretical alignment research establishes the conceptual and mathematical foundations for safe AI.

Core Concepts:

Corrigibility: Systems that allow correction and shutdown
Goal Misgeneralization: When learned goals differ from intended goals
Agent Foundations: Mathematical foundations of agency

Scalable Oversight:

Scalable Oversight: Supervising superhuman systems
Eliciting Latent Knowledge (ELK): Getting models to report what they know
AI Debate: Using AI to verify AI reasoning

Formal Approaches:

Formal Verification: Mathematical proofs of properties
Provably Safe AI: Guarantees through formal methods
CIRL: Cooperative Inverse Reinforcement Learning

Multi-Agent:

Cooperative AI: AI systems that cooperate with humans and each other

Related Wiki Pages

Top Related Pages

Approach

Eliciting Latent Knowledge (ELK)

ELK is the unsolved problem of extracting an AI's true beliefs rather than human-approved outputs.

Approach

AI Safety via Debate

AI Safety via Debate proposes using adversarial AI systems to argue opposing positions while humans judge, designed to scale alignment to superhuma...

Research Area

Scalable Oversight

Methods for supervising AI systems on tasks too complex for direct human evaluation, including debate, recursive reward modeling, and process super...

Risk

Goal Misgeneralization

Goal misgeneralization occurs when AI systems learn capabilities that transfer to new situations but pursue wrong objectives in deployment.

Approach

Provably Safe AI (davidad agenda)

An ambitious research agenda to design AI systems with mathematical safety guarantees from the ground up, led by ARIA's £59M Safeguarded AI program...

Theoretical Foundations (Overview)

Related Wiki Pages

Top Related Pages

Eliciting Latent Knowledge (ELK)

AI Safety via Debate

Scalable Oversight

Goal Misgeneralization

Provably Safe AI (davidad agenda)

Approaches

Other