Longterm Wiki

Circuit Tracing / Attribution Graphs

AnthropicSafety Milestonescircuit-tracing

Record Metadata

Record Keycircuit-tracing
EntityAnthropic
CollectionSafety Milestones(11 records total)
SchemaSignificant safety research publications and policy milestones.
YAML Filepackages/kb/data/things/mK9pX3rQ7n.yaml

Fields

NameCircuit Tracing / Attribution Graphs
DateMar 2025
Typeresearch-paper
DescriptionShowed Claude has a shared conceptual space where reasoning happens before language translation
Sourcetransformer-circuits.pub

Other Records in Safety Milestones (10)

KeyNameDateType
constitutional-ai-paperConstitutional AI PaperDec 2022research-paper
rsp-v1Responsible Scaling Policy v1.0Sep 2023policy-update
sleeper-agents-paperSleeper Agents PaperJan 2024research-paper
scaling-monosemanticityScaling MonosemanticityMay 2024research-paper
rsp-v2RSP v2.0Oct 2024policy-update
alignment-faking-paperAlignment Faking PaperDec 2024research-paper
constitutional-classifiersConstitutional Classifiers ChallengeFeb 2025red-team
asl-3-activationASL-3 ActivationMay 2025safety-eval
constitution-publishedClaude's Constitution PublishedJan 2026policy-update
rsp-v3RSP v3.0 (Frontier Safety Roadmaps)Feb 2026policy-update
Record: circuit-tracing | Longterm Wiki