Longterm Wiki
Navigation
Updated 2026-03-13HistoryData
Page StatusResponse
Edited today93 words
22QualityDraft42ImportanceReference50ResearchModerate
Summary

This is a pure navigation/index page listing theoretical alignment concepts (corrigibility, ELK, CIRL, formal verification, etc.) with one-line descriptions and entity links, containing no substantive analysis or original content.

Content3/13
LLM summaryScheduleEntityEdit history1Overview
Tables0/ ~1Diagrams0Int. links10/ ~3Ext. links0/ ~1Footnotes0/ ~2References0/ ~1Quotes0Accuracy0RatingsN:1 R:1 A:1.5 C:2.5
Change History1
Clarify overview pages with new entity type3 weeks ago

Added `overview` as a proper entity type throughout the system, migrated all 36 overview pages to `entityType: overview`, built overview-specific InfoBox rendering with child page links, created an OverviewBanner component, and added a knowledge-base-overview page template to Crux.

Issues1
StructureNo tables or diagrams - consider adding visual content

Theoretical Foundations (Overview)

Theoretical alignment research establishes the conceptual and mathematical foundations for safe AI.

Core Concepts:

  • Corrigibility: Systems that allow correction and shutdown
  • Goal Misgeneralization: When learned goals differ from intended goals
  • Agent Foundations: Mathematical foundations of agency

Scalable Oversight:

  • Scalable Oversight: Supervising superhuman systems
  • Eliciting Latent Knowledge (ELK): Getting models to report what they know
  • AI Debate: Using AI to verify AI reasoning

Formal Approaches:

  • Formal Verification: Mathematical proofs of properties
  • Provably Safe AI: Guarantees through formal methods
  • CIRL: Cooperative Inverse Reinforcement Learning

Multi-Agent:

  • Cooperative AI: AI systems that cooperate with humans and each other

Related Pages

Top Related Pages

Approaches

Formal Verification (AI Safety)Cooperative IRL (CIRL)Cooperative AIAgent Foundations

Safety Research

Corrigibility