Back
Weapons of Mass Destruction Proxy Benchmark (WMDP)
webwmdp.ai·wmdp.ai/
WMDP is a key benchmark for assessing WMD-related hazardous capabilities in LLMs, relevant to AI safety evaluations conducted by labs and regulators to gate model deployment decisions.
Metadata
Importance: 72/100tool pagetool
Summary
WMDP is a benchmark designed to measure and evaluate hazardous knowledge in large language models related to biosecurity, chemical, nuclear, and radiological weapons. It serves as a proxy for assessing dangerous capabilities in AI systems and supports unlearning research aimed at reducing such risks. The benchmark helps researchers identify and mitigate the potential for LLMs to assist in weapons development.
Key Points
- •Provides a multiple-choice benchmark with ~4,000 questions across biosecurity, chemical, nuclear, and radiological domains to assess hazardous LLM knowledge.
- •Designed to support machine unlearning techniques that can selectively remove dangerous knowledge from models without degrading general capabilities.
- •Helps AI developers and safety researchers identify models that may inadvertently provide uplift toward weapons of mass destruction.
- •Serves as both an evaluation tool and a red-teaming resource for responsible AI deployment decisions.
- •Accompanies the CUT (Corrective Unlearning Toolkit) method for reducing hazardous knowledge while preserving model utility.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| AI Safety Institutes (AISIs) | Policy | 69.0 |
| Capability Unlearning / Removal | Approach | 65.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 20260 KB
WMDP Benchmark You need to enable JavaScript to run this app.
Resource ID:
cfa49cff8bb3ac32 | Stable ID: sid_DKj4ar6gXK