Navigation
Mechanistic Interpretability
Anthropic › Research Areas › mechanistic-interpretability
Record Metadata
| Record Key | mechanistic-interpretability |
| Entity | Anthropic |
| Collection | Research Areas(6 records total) |
| Schema | Major research initiatives and focus areas. |
| YAML File | packages/kb/data/things/mK9pX3rQ7n.yaml |
Fields
| Name | Mechanistic Interpretability |
| Description | Understanding neural network internals through reverse-engineering |
| Team Size | 50 |
| Started | Jan 2021 |
| Key Publication | transformer-circuits.pub↗ |
| Notes | Led by Chris Olah; MIT Tech Review 2026 Breakthrough Technology; 34M features identified |
Other Records in Research Areas (5)
| Key | Name | Description | Team Size |
|---|---|---|---|
| constitutional-ai | Constitutional AI | Training AI systems to follow principles through self-critique and RLAIF | — |
| alignment-science | Alignment Science | Scalable oversight, weak-to-strong generalization, robustness to jailbreaks | — |
| responsible-scaling-policy | Responsible Scaling Policy | Framework for evaluating and mitigating risks at each capability level | — |
| sleeper-agents | Sleeper Agents Research | Investigating whether AI systems can maintain hidden behaviors through training | — |
| ai-welfare | AI Welfare Research | Investigating moral status and welfare considerations for AI systems | — |