Evaluation Methodology
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: METR
METR is a key organization in AI safety evaluation; their methodologies are used by Anthropic, OpenAI, and others as part of responsible scaling policies, making this a reference point for anyone studying AI evaluation frameworks.
Metadata
Summary
METR (Model Evaluation & Threat Research) develops rigorous methodologies for evaluating AI systems, focusing on assessing autonomous capabilities and potential risks from advanced AI models. Their work establishes frameworks for measuring dangerous capabilities including deception, autonomous replication, and other safety-relevant behaviors. METR's evaluations inform deployment decisions and safety thresholds for frontier AI labs.
Key Points
- •Develops standardized evaluation frameworks for assessing dangerous or safety-relevant capabilities in frontier AI models
- •Focuses on autonomous task completion, deceptive alignment indicators, and potential for self-replication or resource acquisition
- •Conducts third-party evaluations used by major AI labs to inform deployment and safety decisions
- •Research informs policy discussions around responsible scaling policies and capability thresholds
- •Methodology bridges technical AI safety research and practical governance requirements for frontier models
Cited by 3 pages
| Page | Type | Quality |
|---|---|---|
| Mesa-Optimization Risk Analysis | Analysis | 61.0 |
| METR | Organization | 66.0 |
| Third-Party Model Auditing | Approach | 64.0 |
Cached Content Preview
Research - METR
Research
Notes
Updates
About
Donate
Careers
Search
-->
Research
Notes
Updates
About
Donate
Careers
Menu
Research
Our AI evaluations research focuses on assessing broad autonomous capabilities and the ability of AI systems to accelerate AI R&D. We also study potential AI behavior that threatens the integrity of evaluations and mitigations for such behavior.
We are Changing our Developer Productivity Experiment Design
24 February 2026
Our second developer productivity study faces selection effects from wider AI adoption, prompting us to redesign our approach.
Read more
Time Horizon 1.1
29 January 2026
We’re releasing a new version of our time horizon estimates (TH1.1), using more tasks and a new eval infrastructure.
Read more
Early work on monitorability evaluations
22 January 2026
We show preliminary results on a prototype evaluation that tests monitors' ability to catch AI agents doing side tasks, and AI agents' ability to bypass this monitoring.
Read more
GPT-5.1-Codex-Max Evaluation Results
19 November 2025
We evaluate whether GPT-5.1-Codex-Max poses significant catastrophic risks via AI self-improvement or rogue replication. We conclude that this seems unlikely.
Read more
MALT: A Dataset of Natural and Prompted Behaviors That Threaten Eval Integrity
14 October 2025
MALT (Manually-reviewed Agentic Labeled Transcripts) is a dataset of natural and prompted examples of behaviors that threaten evaluation integrity (like generalized reward hacking or sandbagging).
Read more
Forecasting the Impacts of AI R&D Acceleration: Results of a Pilot Study
20 August 2025
AI agents are improving rapidly at autonomous software development and machine learning tasks, and, if recent trends hold, may match human researchers at challenging months-long research projects in under a decade. Some economic models predict that automation of AI research by AI agents could increase the pace of further progress dramatically, with many years of progress at the current rate being compressed into months. AI developers have identified this as a key capability to monitor and prepare for, since the natio
... (truncated, 14 KB total)a4652ab64ea54b52 | Stable ID: sid_u2avHgeaD6