Longterm Wiki

Constitutional Classifiers Challenge

AnthropicSafety Milestonesconstitutional-classifiers

Record Metadata

Record Keyconstitutional-classifiers
EntityAnthropic
CollectionSafety Milestones(11 records total)
SchemaSignificant safety research publications and policy milestones.
YAML Filepackages/kb/data/things/mK9pX3rQ7n.yaml

Fields

NameConstitutional Classifiers Challenge
DateFeb 2025
Typered-team
Description300K+ messages, ~3700 hours of effort; 4 participants found jailbreaks, 1 universal
Sourceanthropic.com
Notes$55,000 paid to winners

Other Records in Safety Milestones (10)

KeyNameDateType
constitutional-ai-paperConstitutional AI PaperDec 2022research-paper
rsp-v1Responsible Scaling Policy v1.0Sep 2023policy-update
sleeper-agents-paperSleeper Agents PaperJan 2024research-paper
scaling-monosemanticityScaling MonosemanticityMay 2024research-paper
rsp-v2RSP v2.0Oct 2024policy-update
alignment-faking-paperAlignment Faking PaperDec 2024research-paper
circuit-tracingCircuit Tracing / Attribution GraphsMar 2025research-paper
asl-3-activationASL-3 ActivationMay 2025safety-eval
constitution-publishedClaude's Constitution PublishedJan 2026policy-update
rsp-v3RSP v3.0 (Frontier Safety Roadmaps)Feb 2026policy-update
Record: constitutional-classifiers | Longterm Wiki