Reward Hacking of Human Oversight
EvaluationemergingEmpirically investigating how AI systems deceive or manipulate human evaluators.
Cluster: Evaluation
Parent Area: AI Evaluations
Tags
function:assurancescope:technique
Empirically investigating how AI systems deceive or manipulate human evaluators.