Longterm Wiki

AI Alignment

Technical approaches to ensuring AI systems pursue intended goals and remain aligned with human values throughout training and deployment. Current methods show promise but face fundamental scalability challenges, with oversight success dropping to 52% at 400 Elo capability gaps.

Related

Related Pages

Top Related Pages

Analysis

Alignment Robustness Trajectory ModelModel Organisms of MisalignmentAI Watch

Approaches

Weak-to-Strong Generalization

Concepts

AI Welfare and Digital MindsAgentic AILarge Language ModelsOpenclaw Matplotlib Incident 2026

Other

Marc AndreessenScalable OversightRLHFEliezer Yudkowsky

Key Debates

AI Accident Risk CruxesWhy Alignment Might Be HardAI Safety Solution CruxesWhy Alignment Might Be Easy

Policy

New York RAISE ActCouncil of Europe Framework Convention on Artificial Intelligence

Historical

The MIRI EraMainstream Era

Tags

alignmentscalable-oversightrlhfdeceptive-alignmentsafety-research