AISI Frontier AI Trends
governmentCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: UK AI Safety Institute
Published by the UK AI Safety Institute (AISI), this report offers an authoritative government perspective on frontier AI capability trends and safety considerations, useful for tracking official assessments of the AI risk landscape.
Metadata
Summary
A UK AI Safety Institute government assessment documenting exponential performance improvements across frontier AI systems in multiple domains. The report evaluates emerging capabilities and associated risks, calling for robust safeguards as systems advance rapidly. It serves as an official benchmark of the current frontier AI landscape from a national safety authority.
Key Points
- •Documents exponential performance improvements in frontier AI systems across multiple capability domains.
- •Identifies emerging capabilities that warrant close monitoring and proactive risk assessment.
- •Highlights the need for robust safeguards commensurate with rapidly advancing AI performance.
- •Represents an official government-level capability assessment from the UK AI Safety Institute.
- •Incorporates red-teaming and structured evaluation methodologies to assess frontier model risks.
Review
Cited by 19 pages
| Page | Type | Quality |
|---|---|---|
| AI Risk Interaction Matrix | Analysis | 65.0 |
| Frontier AI Labs (Overview) | -- | 85.0 |
| METR | Organization | 66.0 |
| UK AI Safety Institute | Organization | 52.0 |
| Capability Elicitation | Approach | 91.0 |
| Dangerous Capability Evaluations | Approach | 64.0 |
| Eval Saturation & The Evals Gap | Approach | 65.0 |
| Evals-Based Deployment Gates | Approach | 66.0 |
| AI Evaluations | Research Area | 72.0 |
| AI Evaluation | Approach | 72.0 |
| International AI Safety Summit Series | Event | 63.0 |
| Third-Party Model Auditing | Approach | 64.0 |
| AI Output Filtering | Approach | 63.0 |
| Refusal Training | Approach | 63.0 |
| Seoul Declaration on AI Safety | Policy | 60.0 |
| Technical AI Safety Research | Crux | 66.0 |
| Compute Thresholds | Concept | 91.0 |
| AI Value Lock-in | Risk | 64.0 |
| AI Capability Sandbagging | Risk | 67.0 |
Cached Content Preview
Frontier AI Trends Report by The AI Security Institute (AISI)
Read the Frontier AI Trends Report Please enable javascript for this website.
A
A
Careers
Publications
Trends Report Frontier AI Trends Report
Our first public, evidence‑based assessment of how the world’s most advanced AI systems are evolving, bringing together results from two years of AISI's frontier model testing.
Trends Report PDF
Table of contents: Download PDF
View the full PDF version (recommended for desktop).
Executive Summary
The UK AI Security Institute (AISI) has conducted evaluations of frontier AI systems since November 2023 across domains critical to national security and public safety. This report presents our first public analysis of the trends we've observed. It seeks to provide accessible, data-driven insights into the frontier of AI capabilities and promote a shared understanding among governments, industry, and the public.
AI capabilities are improving rapidly across all tested domains. Performance in some areas is doubling every eight months, and expert baselines are being surpassed rapidly.
See Figure 1. In the cyber domain, AI models can now complete apprentice-level tasks 50% of the time on average, compared to just over 10% of the time in early 2024 (Figure 10). In 2025, we tested the first model that could successfully complete expert-level tasks typically requiring over 10 years of experience for a human practitioner. The length of cyber tasks (expressed as how long they would take a human expert) that models can complete unassisted is doubling roughly every eight months (Figure 3). On other tasks testing for autonomy skills, the most advanced systems we’ve tested can autonomously complete software tasks that would take a human expert over an hour (Figure 2).
In chemistry and biology , AI models have far surpassed PhD-level experts on some domain-specific expertise. They first reached our expert baseline for open-ended questions in 2024 and now exceed it by up to 60% (Figure 5). Models are also increasingly able to provide real-time lab support; we saw the first models able to generate protocols for scientific experiments that were judged to be accurate in late 2024 (Figure 7). These have since been proven feasible to implement in a wet lab. Today’s systems are also now up to 90% better than human experts at providing troubleshooting support for wet lab experiments (Figure 8).
Model safeguards are improving, but vulnerabilities remain .
The models with the strongest safeguards are requiring longer, more sophisticated attacks to jailbreak for certain malicious request categories (we found a 40x difference in expert effort required to jailbreak two models released six months apart, Figure 13). However, the efficacy of safeguards varies between models – and we’ve managed to find vulnerabilities in every system we’ve tested.
Some of the capabilities that would be required for AI mode
... (truncated, 76 KB total)7042c7f8de04ccb1 | Stable ID: sid_n2lcyMgNlv