Pre-Deployment evaluation of OpenAI's o1 model

government

UK AI Safety Institute·aisi.gov.uk/blog/pre-deployment-evaluation-of-openais-o1-...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: UK AI Safety Institute

This is an official government evaluation report from AISI (UK) and its US counterpart, representing one of the first formal pre-deployment government safety assessments of a frontier model and a key case study in operationalizing AI governance frameworks.

Metadata

Importance: 72/100organizational reportprimary source

Summary

The US and UK AI Safety Institutes jointly conducted a pre-deployment safety evaluation of OpenAI's o1 reasoning model, assessing its capabilities in cyber, biological, and software development domains. The evaluation benchmarked o1 against reference models to identify potential risks before public release. This represents an early example of government-led pre-deployment AI safety testing through formal institute collaboration.

Key Points

•First major joint evaluation between the US and UK AI Safety Institutes, establishing a model for international government cooperation on AI safety assessments.
•Tested o1 across high-risk capability domains including cybersecurity, biological threats, and software development to identify uplift risks.
•Compared o1's performance against multiple reference models to contextualize capability levels and potential for misuse.
•Evaluation occurred pre-deployment, reflecting a proactive rather than reactive approach to frontier model risk assessment.
•Represents an operationalization of safety commitments made by frontier AI labs to engage with government safety bodies before release.

Review

The research represents a significant collaborative effort in AI safety evaluation, focusing on systematically assessing the potential capabilities and risks of OpenAI's o1 model through structured testing methodologies. By examining the model's performance across cyber capabilities, biological research tasks, and software development challenges, the institutes aimed to provide a nuanced understanding of its potential impacts and limitations. The methodology employed a multi-faceted approach, including question answering, agent tasks, and qualitative probing, with evaluations conducted by expert engineers and scientists. While the findings suggest o1's performance is largely comparable to reference models, with notable exceptions in cryptography-related challenges, the researchers emphasize the preliminary nature of the assessment. The study underscores the importance of rigorous, independent safety evaluations in a rapidly evolving AI landscape, highlighting the need for continuous assessment and improvement of AI safety protocols.

Cited by 3 pages

Page	Type	Quality
Evals-Based Deployment Gates	Approach	66.0
International AI Safety Summit Series	Event	63.0
Third-Party Model Auditing	Approach	64.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20269 KB

Pre-Deployment evaluation of OpenAI’s o1 model | AISI Work 

 

 Read the Frontier AI Trends Report Please enable javascript for this website. 
 A 
 
 A 
 Careers 
 
 
 
 Blog 
 
 Organisation Pre-Deployment evaluation of OpenAI’s o1 model 

 The UK Artificial Intelligence Safety Institute and the U.S. Artificial Intelligence Safety Institute conducted a joint pre-deployment evaluation of OpenAI&#x27;s o1 model

 — Dec 18, 2024 Note to readers: we changed our name to the AI Security Institute on 14 February 2025. Read more here. 

 Introduction 

 The UK Artificial Intelligence Safety Institute (UK AISI) and the U.S. Artificial Intelligence Safety Institute (US AISI) conducted a joint pre-deployment evaluation of OpenAI’s latest model, o1 (released December 5, 2024).  

 The following is a high-level overview of the evaluations conducted, as well as a snapshot of the findings from each domain tested. A more detailed technical report can be found here. 

 Overview of the Joint Safety Research & Testing Exercise 

 US AISI and UK AISI conducted testing during a limited period of pre-deployment access to the o1 model. Testing was conducted by expert engineers, scientists, and subject matter specialists from staff at both Institutes, and the findings were shared with OpenAI before the model was publicly released. 

 US AISI and UK AISI ran separate but complementary tests to assess the model’s capabilities across three domains: (1) cyber capabilities , (2) biological capabilities , (3) and software and AI development .  

 To assess the model’s relative capabilities and evaluate the potential real-world impacts of o1 across these areas, US AISI and UK AISI compared its performance to a series of similar reference models: OpenAI’s o1-preview, OpenAI’s GPT-4o, and both the upgraded and earlier version of Anthropic’s Claude 3.5 Sonnet.  

 These comparisons are intended only to assess the relative capability improvements of o1, in order to improve scientific interpretation of evaluation results.  

 The version of o1 that was tested exhibited a number of performance issues related to tool-calling and output formatting. US AISI and UK AISI took steps to address these issues by adapting their agent designs, including adjusting prompts and introducing simple mechanisms to help the agent recover from errors. The results below reflect o1’s performance with this scaffolding in place. A version of o1 that was better optimized for tool use might exhibit better performance on many evaluations. This report makes no claims about the performance of other versions of o1. 

 Methodology 

 US AISI and UK AISI tested the model by drawing on a range of techniques including:  

 Question Answering: The model was asked to correctly answer a series of questions that test knowledge or problem solving on a given topic. Answers were most often graded automatically by another model, then checked by a human with knowledge of the correct answers.  
 Agent Tasks: The model opera

... (truncated, 9 KB total)

Resource ID: e23f70e673a090c1 | Stable ID: sid_peJnCBcxL9