How AISI Is Addressing the Gap Between AI Capabilities and Safety Mitigations

government

UK AI Safety Institute·aisi.gov.uk/blog/aisis-research-direction-for-technical-s...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: UK AI Safety Institute

Official UK government AI Security Institute blog post from March 2025 detailing their technical research agenda; relevant for tracking state-level institutional priorities in AI safety and identifying funded research directions.

Metadata

Importance: 62/100blog postprimary source

Summary

The UK AI Security Institute (AISI) outlines its technical research strategy to close the growing gap between AI capabilities and available safety mitigations. Research is organized into two pillars: Safeguards Analysis (countering misuse by adversarial actors) and Control & Alignment (preventing loss of control of highly capable systems). AISI is actively hiring researchers and offering up to £200,000 in Challenge Fund grants to academics and non-profits to accelerate progress on critical safety challenges.

Key Points

•AISI identifies a widening gap between advancing AI capabilities and available safety mitigations, spanning misuse risks and loss-of-control scenarios.
•Safeguards Analysis focuses on three areas: defending hosted frontier AI systems, defending against third-party attacks (data/model/prompt poisoning), and mitigating open-weight model misuse.
•The Control & Alignment pillar focuses on designing and evaluating control protocols robust to intentional subversion by potentially misaligned AI systems.
•AISI launched a Challenge Fund providing grants up to £200,000 to academics and non-profits to rapidly advance AI safety and security science.
•The institute is actively hiring research scientists and engineers across both the Safeguards Analysis and Control & Alignment teams.

Cited by 1 page

Page	Type	Quality
Capability-Alignment Race Model	Analysis	62.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20268 KB

How we’re addressing the gap between AI capabilities and mitigations | AISI Work 

 

 Read the Frontier AI Trends Report Please enable javascript for this website. 
 A 
 
 A 
 Careers 
 
 
 
 Blog 
 
 Organisation How we’re addressing the gap between AI capabilities and mitigations

 We outline our approach to technical solutions for misuse and loss of control.

 — Mar 11, 2025 As AI capabilities advance, they present security risks which the AI Security Institute (AISI) is working to mitigate and control. This extends beyond our risk assessments , to conducting and funding technical research to actively improve safety and security measures.  

 However, the gap between capabilities and mitigations is growing fast. This gap exists across different risks, including adversarial actors misusing AI systems to facilitate large-scale harms, the possibility of developers losing control of highly intelligent AI systems, and a range of other potentially destabilising effects AI could have on society. This blog outlines our approach to technical solutions for misuse and loss of control . 

 We&#x27;re hiring research scientists and engineers across our Safeguards Analysis team (to counter misuse) as well as our Control and Alignment teams (to counter loss of control).  

 To rapidly advance the science of AI safety and security, AISI aims to maximise the quality research tackling this problem. This is why we recently established the Challenge Fund , which will provide grants of up to £200,000 to academics and non-profits. 

 Safeguards Analysis 

 The risk of attackers misusing AI to cause large-scale harm or disrupting their operation to damage critical systems increases as AI systems advance. Research that seeks to understand, evaluate, and improve the technical measures designed to address these risks (safeguards) is urgently needed.  

 We’re excited about progress on three key challenges: 

 Defending hosted frontier AI systems against misuse. When AI systems are hosted by benign actors, a range of safeguards can be used to prevent attackers using the AI to aid in illegal activity. These can include model-level safeguards like safety training; system-level safeguards like real-time monitors; access safeguards like giving vetted users or organisations access to less safeguarded or more advanced systems; or maintenance safeguards like rapid vulnerability remediation. We must also defend alternate forms of model access, such as insider threats and fine-tuning APIs.  
 Defending against 3rd party attacks. Exposure to attacker-controlled data during training (“data poisoning”) or inference (“prompt-injection”)—as well as direct adversarial manipulation of model weights (“model poisoning”)—can lead to adversaries controlling the actions or goals of otherwise benign AI systems. This could lead to AI agents exfiltrating sensitive information or causing large amounts of harm, or broad attacks on the availability of critical AI systems.  
 Mitigating misuse

... (truncated, 8 KB total)

Resource ID: 6aaf36a7ca97c4d0 | Stable ID: sid_gVi9jQDl4U