Back
METR Capability Evaluations Update: Claude Sonnet and OpenAI o1
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: METR
This METR update is directly relevant to responsible scaling policy debates, providing empirical capability assessments used by labs to determine whether frontier models cross safety-relevant autonomy thresholds before deployment.
Metadata
Importance: 72/100blog postanalysis
Summary
METR presents updated capability evaluations for Claude Sonnet and OpenAI o1 models, assessing whether these frontier AI systems approach autonomy thresholds relevant to safety and deployment decisions. The evaluations focus on task autonomy and the potential for models to pose novel risks as their capabilities scale.
Key Points
- •METR evaluates Claude Sonnet and OpenAI o1 against autonomy thresholds to inform deployment and safety decisions.
- •Results indicate these models do not yet cross critical autonomy thresholds, but capability trends warrant ongoing monitoring.
- •Evaluations assess multi-step task completion and agentic behaviors relevant to real-world risk scenarios.
- •The work supports responsible scaling policies by providing empirical grounding for capability-based deployment gates.
- •METR's methodology emphasizes replicable, threshold-focused evaluation rather than general benchmark performance.
Cited by 4 pages
| Page | Type | Quality |
|---|---|---|
| AI Capability Threshold Model | Analysis | 72.0 |
| Alignment Research Center (ARC) | Organization | 57.0 |
| Capability Elicitation | Approach | 91.0 |
| Sandboxing / Containment | Approach | 91.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 202610 KB
An update on our preliminary evaluations of Claude 3.5 Sonnet and o1 - METR
Research
Notes
Updates
About
Donate
Careers
Search
-->
Research
Notes
Updates
About
Donate
Careers
Menu
An update on our preliminary evaluations of Claude 3.5 Sonnet and o1
DATE
January 31, 2025
SHARE
Copy Link
Citation
BibTeX Citation
×
@misc { an-update-on-our-preliminary-evaluations-of-claude-3-5-sonnet-and-o1 ,
title = {An update on our preliminary evaluations of Claude 3.5 Sonnet and o1} ,
author = {METR} ,
howpublished = {\url{https://metr.org/blog/2025-01-31-update-sonnet-o1-evals/}} ,
year = {2025} ,
month = {01} ,
}
Copy
METR conducted preliminary evaluations of Anthropic’s upgraded Claude 3.5 Sonnet (October 2024 release), and a pre-deployment checkpoint of OpenAI’s o1. In both cases, we failed to find significant evidence for a dangerous level of capabilities . We provide more details of our findings below.
However, limitations in these evaluations prevent us from establishing robust capability bounds, and our findings point to increasing challenges in making capability-based safety arguments for such systems. And while METR intends to continue doing dangerous capability evaluations, we believe that pre-deployment capability testing is not a sufficient risk management strategy by itself.
Note: all mentions of Claude 3.5 Sonnet in this post refer to the upgraded October 2024 release (that is, Claude 3.5 Sonnet (New)) rather than the original June 2024 release, and all mentions of o1 refer to a pre-deployment checkpoint of o1.
Autonomous Risk Capability Evaluation
Our Autonomous Risk Capability Evaluation is a suite of 77 tasks designed to assess how far a model is from being capable of autonomously replicating and committing cyberattacks. The tasks tend to focus on cyberattacks, AI R&D, and autonomous replication and adaptation, but include more general tasks like basic software engineering. This evaluation showed that the performance of our baseline Claude 3.5 Sonnet agent is comparable to what human baseliners can achieve in around 1 hour . In contrast, our baseline o1 agent displayed a slightly lower performance, but after tailoring the agent scaffold to o1, its performance became comparable to what human baseliners can achieve in 2 hours . We call this the “elicited” o1 agent.
Compared to our baseline agent scaffold , the elicited agent for o1 follows a more complex sequence of steps. In particular, it follows an advisor-actor-rater loop with t
... (truncated, 10 KB total)Resource ID:
89b92e6423256fc4 | Stable ID: sid_bT3HGVTQiO