Longterm Wiki

Scheming / Deception Detection

Evaluationactive

Behavioral and mechanistic tests for detecting deceptive behavior in AI systems.

Cluster: Evaluation
Parent Area: AI Evaluations

Tags

function:assurancescope:technique

Sub-Areas1

NameStatusOrgsPapers
Alignment Faking ExperimentsStudying when and why AI systems pretend to be aligned during testing.emerging01