Navigation

Alignment Science

Anthropic › Research Areas › alignment-science

Record Metadata

Record Key	`alignment-science`
Entity	Anthropic
Collection	Research Areas(6 records total)
Schema	Major research initiatives and focus areas.
YAML File	`packages/kb/data/things/mK9pX3rQ7n.yaml`

Fields

Name	Alignment Science
Description	Scalable oversight, weak-to-strong generalization, robustness to jailbreaks
Started	May 2024
Notes	Led by Jan Leike after his departure from OpenAI's Superalignment team

Other Records in Research Areas (5)

Key	Name	Description	Team Size
mechanistic-interpretability	Mechanistic Interpretability	Understanding neural network internals through reverse-engineering	50
constitutional-ai	Constitutional AI	Training AI systems to follow principles through self-critique and RLAIF	—
responsible-scaling-policy	Responsible Scaling Policy	Framework for evaluating and mitigating risks at each capability level	—
sleeper-agents	Sleeper Agents Research	Investigating whether AI systems can maintain hidden behaviors through training	—
ai-welfare	AI Welfare Research	Investigating moral status and welfare considerations for AI systems	—

Record: alignment-science | Longterm Wiki