AI Evaluations
EvaluationactiveSystematic testing and measurement of AI system capabilities, alignment, and safety properties.
Organizations
4
Risks Addressed
2
Cluster: Evaluation
Tags
function:assurancescope:field
Organizations4
| Organization | Role |
|---|---|
| Anthropic | active |
| Alignment Research Center | active |
| Google DeepMind | active |
| OpenAI | active |
Sub-Areas10
| Name | Status | Orgs | Papers |
|---|---|---|---|
| Alignment EvaluationsTesting whether AI systems are actually aligned, not just capable of appearing aligned. | active | 0 | 0 |
| Backdoor DetectionDetecting adversarially implanted vulnerabilities in model weights. | active | 0 | 0 |
| Capability ElicitationMethods for discovering hidden or latent capabilities in AI systems. | active | 0 | 0 |
| Control EvaluationsStress-testing systems designed to constrain AI behavior; monitoring for collusion. | emerging | 0 | 0 |
| Dangerous Capability EvaluationsTesting AI systems for CBRN, cyber, autonomy, and other dangerous capabilities. | active | 0 | 0 |
| Evaluation AwarenessStudying how AI systems might game evaluations by detecting when they are being tested. | active | 0 | 0 |
| Red TeamingAdversarial testing of AI systems to discover failure modes, both manual and automated. | active | 3 | 0 |
| Reward Hacking of Human OversightEmpirically investigating how AI systems deceive or manipulate human evaluators. | emerging | 0 | 0 |
| Scheming / Deception DetectionBehavioral and mechanistic tests for detecting deceptive behavior in AI systems. | active | 0 | 0 |
| Sleeper Agent DetectionDetecting planted backdoors and conditional misbehavior in trained models. | active | 0 | 1 |