Navigation
Sleeper Agents Paper
Anthropic › Safety Milestones › sleeper-agents-paper
Record Metadata
| Record Key | sleeper-agents-paper |
| Entity | Anthropic |
| Collection | Safety Milestones(11 records total) |
| Schema | Significant safety research publications and policy milestones. |
| YAML File | packages/kb/data/things/mK9pX3rQ7n.yaml |
Fields
| Name | Sleeper Agents Paper |
| Date | Jan 2024 |
| Type | research-paper |
| Description | Showed deceptive LLM behaviors can persist through safety training |
| Source | arxiv.org↗ |
| Notes | Seminal paper on deceptive alignment; model inserted vulnerabilities when year changed |
Other Records in Safety Milestones (10)
| Key | Name | Date | Type |
|---|---|---|---|
| constitutional-ai-paper | Constitutional AI Paper | Dec 2022 | research-paper |
| rsp-v1 | Responsible Scaling Policy v1.0 | Sep 2023 | policy-update |
| scaling-monosemanticity | Scaling Monosemanticity | May 2024 | research-paper |
| rsp-v2 | RSP v2.0 | Oct 2024 | policy-update |
| alignment-faking-paper | Alignment Faking Paper | Dec 2024 | research-paper |
| constitutional-classifiers | Constitutional Classifiers Challenge | Feb 2025 | red-team |
| circuit-tracing | Circuit Tracing / Attribution Graphs | Mar 2025 | research-paper |
| asl-3-activation | ASL-3 Activation | May 2025 | safety-eval |
| constitution-published | Claude's Constitution Published | Jan 2026 | policy-update |
| rsp-v3 | RSP v3.0 (Frontier Safety Roadmaps) | Feb 2026 | policy-update |