Skip to content
Longterm Wiki

Reward Hacking of Human Oversight

Evaluationemerging
Empirically investigating how AI systems deceive or manipulate human evaluators.
Organizations
4
Cluster: Evaluation
Parent Area: AI Evaluations

Tags

function:assurancescope:technique