Longterm Wiki
Back

AISI, *How we're addressing the gap between AI capabilities and mitigations* (https://aisi.gov.uk/blog/aisis-research...

government

Data Status

Not fetched

Cited by 1 page

PageTypeQuality
Capability-Alignment Race ModelAnalysis62.0

Cached Content Preview

HTTP 200Fetched Feb 23, 20268 KB
How we’re addressing the gap between AI capabilities and mitigations | AISI Work 

 

 Read the Frontier AI Trends Report Please enable javascript for this website. 
 A 
 
 A 
 Careers 
 
 
 
 Blog 
 
 Organisation How we’re addressing the gap between AI capabilities and mitigations

 We outline our approach to technical solutions for misuse and loss of control.

 — Mar 11, 2025 As AI capabilities advance, they present security risks which the AI Security Institute (AISI) is working to mitigate and control. This extends beyond our risk assessments , to conducting and funding technical research to actively improve safety and security measures.  

 However, the gap between capabilities and mitigations is growing fast. This gap exists across different risks, including adversarial actors misusing AI systems to facilitate large-scale harms, the possibility of developers losing control of highly intelligent AI systems, and a range of other potentially destabilising effects AI could have on society. This blog outlines our approach to technical solutions for misuse and loss of control . 

 We're hiring research scientists and engineers across our Safeguards Analysis team (to counter misuse) as well as our Control and Alignment teams (to counter loss of control).  

 To rapidly advance the science of AI safety and security, AISI aims to maximise the quality research tackling this problem. This is why we recently established the Challenge Fund , which will provide grants of up to £200,000 to academics and non-profits. 

 Safeguards Analysis 

 The risk of attackers misusing AI to cause large-scale harm or disrupting their operation to damage critical systems increases as AI systems advance. ​​Research that seeks to understand, evaluate, and improve the technical measures designed to address these risks (safeguards) is urgently needed.  

 We’re excited about progress on three key challenges: 

 Defending hosted frontier AI systems against misuse. When AI systems are hosted by benign actors, a range of safeguards can be used to prevent attackers using the AI to aid in illegal activity. These can include model-level safeguards like safety training; system-level safeguards like real-time monitors; access safeguards like giving vetted users or organisations access to less safeguarded or more advanced systems; or maintenance safeguards like rapid vulnerability remediation. We must also defend alternate forms of model access, such as insider threats and fine-tuning APIs.  
 Defending against 3rd party attacks. Exposure to attacker-controlled data during training (“data poisoning”) or inference (“prompt-injection”)—as well as direct adversarial manipulation of model weights (“model poisoning”)—can lead to adversaries controlling the actions or goals of otherwise benign AI systems. This could lead to AI agents exfiltrating sensitive information or causing large amounts of harm, or broad attacks on the availability of critical AI systems. ​ 
 Mitigating misuse

... (truncated, 8 KB total)
Resource ID: 6aaf36a7ca97c4d0 | Stable ID: MmU5NjY1N2