Evaluation Methodology

web

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: METR

METR is a key organization in AI safety evaluation; their methodologies are used by Anthropic, OpenAI, and others as part of responsible scaling policies, making this a reference point for anyone studying AI evaluation frameworks.

Metadata

Importance: 72/100tool pagehomepage

Summary

METR (Model Evaluation & Threat Research) develops rigorous methodologies for evaluating AI systems, focusing on assessing autonomous capabilities and potential risks from advanced AI models. Their work establishes frameworks for measuring dangerous capabilities including deception, autonomous replication, and other safety-relevant behaviors. METR's evaluations inform deployment decisions and safety thresholds for frontier AI labs.

Key Points

•Develops standardized evaluation frameworks for assessing dangerous or safety-relevant capabilities in frontier AI models
•Focuses on autonomous task completion, deceptive alignment indicators, and potential for self-replication or resource acquisition
•Conducts third-party evaluations used by major AI labs to inform deployment and safety decisions
•Research informs policy discussions around responsible scaling policies and capability thresholds
•Methodology bridges technical AI safety research and practical governance requirements for frontier models

Cited by 3 pages

Page	Type	Quality
Mesa-Optimization Risk Analysis	Analysis	61.0
METR	Organization	66.0
Third-Party Model Auditing	Approach	64.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202614 KB

Research - METR 

 

 

 
 

 

 
 

 

 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 

 

 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 Research 
 

 
 
 Notes 
 

 
 
 Updates 
 

 
 
 About 
 

 
 
 Donate 
 

 
 
 Careers 
 

 

 
 
 
 Search
 
 
 
 
 

 
 
 

 
 
 

 
 
 

 

 
 
 
 
 
 
 
 
 
 
 
 
 -->
 
 
 
 

 
 
 
 
 
 
 
 Research 
 

 
 
 
 
 
 
 Notes 
 

 
 
 
 
 
 
 Updates 
 

 
 
 
 
 
 
 About 
 

 
 
 
 
 
 
 Donate 
 

 
 
 
 
 
 
 Careers 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 Menu 
 
 
 

 
 Research 
 Our AI evaluations research focuses on assessing broad autonomous capabilities and the ability of AI systems to accelerate AI R&D. We also study potential AI behavior that threatens the integrity of evaluations and mitigations for such behavior. 
 
 
 
 

 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 We are Changing our Developer Productivity Experiment Design
 
 

 
 24 February 2026
 

 
 
 Our second developer productivity study faces selection effects from wider AI adoption, prompting us to redesign our approach.

 
 
 Read more 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Time Horizon 1.1
 
 

 
 29 January 2026
 

 
 
 We’re releasing a new version of our time horizon estimates (TH1.1), using more tasks and a new eval infrastructure.

 
 
 Read more 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Early work on monitorability evaluations
 
 

 
 22 January 2026
 

 
 
 We show preliminary results on a prototype evaluation that tests monitors' ability to catch AI agents doing side tasks, and AI agents' ability to bypass this monitoring.

 
 
 Read more 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 GPT-5.1-Codex-Max Evaluation Results
 
 

 
 19 November 2025
 

 
 
 We evaluate whether GPT-5.1-Codex-Max poses significant catastrophic risks via AI self-improvement or rogue replication. We conclude that this seems unlikely.

 
 
 Read more 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 MALT: A Dataset of Natural and Prompted Behaviors That Threaten Eval Integrity
 
 

 
 14 October 2025
 

 
 
 MALT (Manually-reviewed Agentic Labeled Transcripts) is a dataset of natural and prompted examples of behaviors that threaten evaluation integrity (like generalized reward hacking or sandbagging).

 
 
 Read more 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Forecasting the Impacts of AI R&D Acceleration: Results of a Pilot Study
 
 

 
 20 August 2025
 

 
 
 AI agents are improving rapidly at autonomous software development and machine learning tasks, and, if recent trends hold, may match human researchers at challenging months-long research projects in under a decade. Some economic models predict that automation of AI research by AI agents could increase the pace of further progress dramatically, with many years of progress at the current rate being compressed into months. AI developers have identified this as a key capability to monitor and prepare for, since the natio

... (truncated, 14 KB total)

Resource ID: a4652ab64ea54b52 | Stable ID: sid_u2avHgeaD6