Longterm Wiki

Constitutional AI

AnthropicResearch Areasconstitutional-ai

Record Metadata

Record Keyconstitutional-ai
EntityAnthropic
CollectionResearch Areas(6 records total)
SchemaMajor research initiatives and focus areas.
YAML Filepackages/kb/data/things/mK9pX3rQ7n.yaml

Fields

NameConstitutional AI
DescriptionTraining AI systems to follow principles through self-critique and RLAIF
StartedDec 2022
Key Publicationarxiv.org
NotesCore alignment technique used in all Claude models; constitution published Jan 2026

Other Records in Research Areas (5)

KeyNameDescriptionTeam Size
mechanistic-interpretabilityMechanistic InterpretabilityUnderstanding neural network internals through reverse-engineering50
alignment-scienceAlignment ScienceScalable oversight, weak-to-strong generalization, robustness to jailbreaks
responsible-scaling-policyResponsible Scaling PolicyFramework for evaluating and mitigating risks at each capability level
sleeper-agentsSleeper Agents ResearchInvestigating whether AI systems can maintain hidden behaviors through training
ai-welfareAI Welfare ResearchInvestigating moral status and welfare considerations for AI systems
Record: constitutional-ai | Longterm Wiki