Dangerous Capability Evaluations
Systematic testing of AI models for dangerous capabilities including bioweapons assistance, cyberattack potential, autonomous self-replication, and persuasion/manipulation abilities to inform deployment decisions and safety policies. Now standard practice with 95%+ frontier model coverage.
Related
Related Pages
Top Related Pages
METR
Model Evaluation and Threat Research conducts dangerous capability evaluations for frontier AI models, testing for autonomous replication, cybersec...
Anthropic
An AI safety company founded by former OpenAI researchers that develops frontier AI models while pursuing safety research, including the Claude mod...
Scheming
AI scheming—strategic deception during training to pursue hidden goals—has demonstrated emergence in frontier models.
Responsible Scaling Policies
Responsible Scaling Policies (RSPs) are voluntary commitments by AI labs to pause scaling when capability or safety thresholds are crossed.
OpenAI
Leading AI lab that developed GPT models and ChatGPT, analyzing organizational evolution from non-profit research to commercial AGI development ami...