Reward Hacking of Human Oversight

Evaluationemerging

Empirically investigating how AI systems deceive or manipulate human evaluators.

Organizations

4

Cluster: Evaluation

Parent Area: AI Evaluations

Tags

function:assurancescope:technique