Skip to content
Longterm Wiki
Index
Citation·page:anthropic:fn36

Anthropic - Footnote 36

Verdictpartial90%
1 check · 4/3/2026

The source does not explicitly state that Anthropic described this as the 'first empirical example' of alignment faking without training. It only mentions that the phenomenon wasn't explicitly programmed into the models. The source does not contain the critics' argument that the behaviors themselves indicate unresolved alignment challenges.

Our claim

entire record

No record data available.

Source evidence

1 src · 1 check
partial90%Haiku 4.5 · 4/3/2026

NoteThe source does not explicitly state that Anthropic described this as the 'first empirical example' of alignment faking without training. It only mentions that the phenomenon wasn't explicitly programmed into the models. The source does not contain the critics' argument that the behaviors themselves indicate unresolved alignment challenges.

Case № page:anthropic:fn36Filed 4/3/2026Confidence 90%