Scheming / Deception Detection
EvaluationactiveBehavioral and mechanistic tests for detecting deceptive behavior in AI systems.
Cluster: Evaluation
Parent Area: AI Evaluations
Tags
function:assurancescope:technique
Sub-Areas1
| Name | Status | Orgs | Papers |
|---|---|---|---|
| Alignment Faking ExperimentsStudying when and why AI systems pretend to be aligned during testing. | emerging | 0 | 1 |