Navigation

Sleeper Agents Paper

Anthropic › Safety Milestones › sleeper-agents-paper

Record Metadata

Record Key	`sleeper-agents-paper`
Entity	Anthropic
Collection	Safety Milestones(11 records total)
Schema	Significant safety research publications and policy milestones.
YAML File	`packages/kb/data/things/mK9pX3rQ7n.yaml`

Fields

Name	Sleeper Agents Paper
Date	Jan 2024
Type	research-paper
Description	Showed deceptive LLM behaviors can persist through safety training
Source	arxiv.org↗
Notes	Seminal paper on deceptive alignment; model inserted vulnerabilities when year changed

Other Records in Safety Milestones (10)

Key	Name	Date	Type
constitutional-ai-paper	Constitutional AI Paper	Dec 2022	research-paper
rsp-v1	Responsible Scaling Policy v1.0	Sep 2023	policy-update
scaling-monosemanticity	Scaling Monosemanticity	May 2024	research-paper
rsp-v2	RSP v2.0	Oct 2024	policy-update
alignment-faking-paper	Alignment Faking Paper	Dec 2024	research-paper
constitutional-classifiers	Constitutional Classifiers Challenge	Feb 2025	red-team
circuit-tracing	Circuit Tracing / Attribution Graphs	Mar 2025	research-paper
asl-3-activation	ASL-3 Activation	May 2025	safety-eval
constitution-published	Claude's Constitution Published	Jan 2026	policy-update
rsp-v3	RSP v3.0 (Frontier Safety Roadmaps)	Feb 2026	policy-update

Record: sleeper-agents-paper | Longterm Wiki