AISI Frontier AI Trends

government

UK AI Safety Institute·aisi.gov.uk/frontier-ai-trends-report

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: UK AI Safety Institute

Published by the UK AI Safety Institute (AISI), this report offers an authoritative government perspective on frontier AI capability trends and safety considerations, useful for tracking official assessments of the AI risk landscape.

Metadata

Importance: 68/100organizational reportanalysis

Summary

A UK AI Safety Institute government assessment documenting exponential performance improvements across frontier AI systems in multiple domains. The report evaluates emerging capabilities and associated risks, calling for robust safeguards as systems advance rapidly. It serves as an official benchmark of the current frontier AI landscape from a national safety authority.

Key Points

•Documents exponential performance improvements in frontier AI systems across multiple capability domains.
•Identifies emerging capabilities that warrant close monitoring and proactive risk assessment.
•Highlights the need for robust safeguards commensurate with rapidly advancing AI performance.
•Represents an official government-level capability assessment from the UK AI Safety Institute.
•Incorporates red-teaming and structured evaluation methodologies to assess frontier model risks.

Review

The AISI Frontier AI Trends report provides a groundbreaking evidence-based analysis of AI system capabilities, tracking performance across critical domains like cyber, chemistry, biology, and autonomy. The research reveals extraordinary progress, with AI models increasingly matching or surpassing human expert performance in complex tasks, often with capabilities doubling every eight months. The report's key contribution lies in its rigorous, multi-dimensional evaluation approach, which not only measures technical capabilities but also assesses potential risks and societal impacts. While demonstrating remarkable technological advancement, the research also underscores significant challenges in AI safety, including persistent vulnerabilities in model safeguards, potential for misuse, and emerging risks related to model autonomy and potential loss of control. The findings suggest that while AI systems are becoming increasingly powerful, ensuring their reliable and safe deployment remains a complex, evolving challenge requiring continuous monitoring and adaptive governance strategies.

Cited by 19 pages

Page	Type	Quality
AI Risk Interaction Matrix	Analysis	65.0
Frontier AI Labs (Overview)	--	85.0
METR	Organization	66.0
UK AI Safety Institute	Organization	52.0
Capability Elicitation	Approach	91.0
Dangerous Capability Evaluations	Approach	64.0
Eval Saturation & The Evals Gap	Approach	65.0
Evals-Based Deployment Gates	Approach	66.0
AI Evaluations	Research Area	72.0
AI Evaluation	Approach	72.0
International AI Safety Summit Series	Event	63.0
Third-Party Model Auditing	Approach	64.0
AI Output Filtering	Approach	63.0
Refusal Training	Approach	63.0
Seoul Declaration on AI Safety	Policy	60.0
Technical AI Safety Research	Crux	66.0
Compute Thresholds	Concept	91.0
AI Value Lock-in	Risk	64.0
AI Capability Sandbagging	Risk	67.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202676 KB

Frontier AI Trends Report by The AI Security Institute (AISI) 

 

 Read the Frontier AI Trends Report Please enable javascript for this website. 
 A 
 
 A 
 Careers 
 
 
 
 Publications 
 
 Trends Report Frontier AI Trends Report

 Our first public, evidence‑based assessment of how the world’s most advanced AI systems are evolving, bringing together results from two years of AISI&#x27;s frontier model testing.
‍

 Trends Report PDF 
 
 Table of contents: Download PDF 
 
 View the full PDF version (recommended for desktop).
‍ 

 Executive Summary  

 The UK AI Security Institute (AISI) has conducted evaluations of frontier AI systems since November 2023 across domains critical to national security and public safety. This report presents our first public analysis of the trends we&#x27;ve observed. It seeks to provide accessible, data-driven insights into the frontier of AI capabilities and promote a shared understanding among governments, industry, and the public.  

 AI capabilities are improving rapidly across all tested domains. Performance in some areas is doubling every eight months, and expert baselines are being surpassed rapidly.   

 See Figure 1. In the cyber domain, AI models can now complete apprentice-level tasks 50% of the time on average, compared to just over 10% of the time in early 2024 (Figure 10). In 2025, we tested the first model that could successfully complete expert-level tasks typically requiring over 10 years of experience for a human practitioner. The length of cyber tasks (expressed as how long they would take a human expert) that models can complete unassisted is doubling roughly every eight months (Figure 3). On other tasks testing for autonomy skills, the most advanced systems we’ve tested can autonomously complete software tasks that would take a human expert over an hour (Figure 2).

 In chemistry and biology , AI models have far surpassed PhD-level experts on some domain-specific expertise. They first reached our expert baseline for open-ended questions in 2024 and now exceed it by up to 60% (Figure 5). Models are also increasingly able to provide real-time lab support; we saw the first models able to generate protocols for scientific experiments that were judged to be accurate in late 2024 (Figure 7). These have since been proven feasible to implement in a wet lab. Today’s systems are also now up to 90% better than human experts at providing troubleshooting support for wet lab experiments (Figure 8).  

 Model safeguards are improving, but vulnerabilities remain . 

 The models with the strongest safeguards are requiring longer, more sophisticated attacks to jailbreak for certain malicious request categories (we found a 40x difference in expert effort required to jailbreak two models released six months apart, Figure 13). However, the efficacy of safeguards varies between models – and we’ve managed to find vulnerabilities in every system we’ve tested.

 Some of the capabilities that would be required for AI mode

... (truncated, 76 KB total)

Resource ID: 7042c7f8de04ccb1 | Stable ID: sid_n2lcyMgNlv