Navigation
Alignment Science
Anthropic › Research Areas › alignment-science
Record Metadata
| Record Key | alignment-science |
| Entity | Anthropic |
| Collection | Research Areas(6 records total) |
| Schema | Major research initiatives and focus areas. |
| YAML File | packages/kb/data/things/mK9pX3rQ7n.yaml |
Fields
| Name | Alignment Science |
| Description | Scalable oversight, weak-to-strong generalization, robustness to jailbreaks |
| Started | May 2024 |
| Notes | Led by Jan Leike after his departure from OpenAI's Superalignment team |
Other Records in Research Areas (5)
| Key | Name | Description | Team Size |
|---|---|---|---|
| mechanistic-interpretability | Mechanistic Interpretability | Understanding neural network internals through reverse-engineering | 50 |
| constitutional-ai | Constitutional AI | Training AI systems to follow principles through self-critique and RLAIF | — |
| responsible-scaling-policy | Responsible Scaling Policy | Framework for evaluating and mitigating risks at each capability level | — |
| sleeper-agents | Sleeper Agents Research | Investigating whether AI systems can maintain hidden behaviors through training | — |
| ai-welfare | AI Welfare Research | Investigating moral status and welfare considerations for AI systems | — |