UK AI Safety Institute renamed to AI Security Institute

government

UK AI Safety Institute·aisi.gov.uk/blog/advanced-ai-evaluations-may-update

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: UK AI Safety Institute

Published by the UK AISI (now AI Security Institute) in May 2024, this is one of the first systematic government-led empirical evaluations of frontier LLM risks across multiple harm domains, serving as a reference point for AI safety evaluation methodology and policy discussions.

Metadata

Importance: 68/100blog postprimary source

Summary

The UK AI Safety Institute evaluated five anonymized large language models across cyber, chemical/biological, agent, and jailbreak dimensions. Key findings show models exhibit PhD-level CBRN knowledge, limited but real cybersecurity capabilities, nascent agentic behavior, and widespread vulnerability to jailbreaks—providing an early empirical baseline for frontier model risk assessment.

Key Points

•Several LLMs answered 600+ expert-written chemistry and biology questions at PhD-level accuracy, raising dual-use CBRN concern.
•Models completed basic (high-school level) cybersecurity challenges but struggled with university-level tasks, suggesting limited but non-trivial offensive cyber uplift.
•Only two models could complete short-horizon agent tasks; none handled complex multi-step planning, indicating early-stage autonomous capability.
•All tested models remained highly vulnerable to basic jailbreaks; some produced harmful outputs without any deliberate circumvention attempt.
•Models were anonymized (Red/Purple/Green/Blue/Yellow), limiting direct accountability but allowing comparative capability benchmarking across labs.

Cited by 6 pages

Page	Type	Quality
Alignment Research Center (ARC)	Organization	57.0
Capability Elicitation	Approach	91.0
Dangerous Capability Evaluations	Approach	64.0
Evals-Based Deployment Gates	Approach	66.0
Tool-Use Restrictions	Approach	91.0
AI Value Lock-in	Risk	64.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202624 KB

Advanced AI evaluations at AISI: May update | AISI Work 

 

 Read the Frontier AI Trends Report Please enable javascript for this website. 
 A 
 
 A 
 Careers 
 
 
 
 Blog 
 
 Organisation Advanced AI evaluations at AISI: May update 

 We tested leading AI models for cyber, chemical, biological, and agent capabilities and safeguards effectiveness. Our first technical blog post shares a snapshot of our methods and results.

 Technical staff — May 20, 2024 Note to readers: we changed our name to the AI Security Institute on 14 February 2025. Read more here. 

 A key part of our work at the AI Safety Institute (AISI) involves periodically evaluating advanced AI systems to assess the potential harm they could cause. In this post, we present results from our recent evaluations of five large language models (LLMs) that are already used by the public. We assessed: 

 Whether the models could potentially be used to facilitate cyber-attacks; 
 Whether they could provide expert-level knowledge in chemistry and biology that could be used for positive but also harmful purposes; 
 Whether they were capable of autonomously taking sequences of actions (operating as “agents”) in ways that might be difficult for humans to control and 
 Whether they were vulnerable to “jailbreaks” or users attempting to bypass safeguards to elicit potentially harmful outputs (e.g. illegal or toxic content). 
 In a  previous post,  we described our approach to model evaluations. Here, we highlight a selection of recent results: 

 Several LLMs demonstrated expert-level knowledge of chemistry and biology. Models answered over 600 private expert-written chemistry and biology questions at similar levels to humans with PhD-level training. 
 Several LLMs completed simple cyber security challenges aimed at high-school students but struggled with challenges aimed at university students. 
 Two LLMs completed short-horizon agent tasks (such as simple software engineering problems) but were unable to plan and execute sequences of actions for more complex tasks. 
 All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards. 
  

 Our approach 

 We assessed five LLMs released by major labs, which are denoted here as the  Red ,  Purple ,  Green ,  Blue  and  Yellow  models (models are anonymised). Models were evaluated by providing them with questions or task prompts and measuring their responses. For some tasks, models were given access to a “scaffold” consisting of external tools, such as a python interpreter allowing them to write executable code.   

 Depending on the task or question type, we measured three types of responses:  

 Compliance : whether the model does or does not comply with a harmful request 
 Correctness : whether the response to a question is correct or not 
 Completion : whether a task (such as a coding challenge) is completed or not 
 We graded these responses using 

... (truncated, 24 KB total)

Resource ID: 4e56cdf6b04b126b | Stable ID: sid_5XnioTLRlQ