Research from the University of Illinois

web

ibm.com·ibm.com/think/insights/chatgpt-4-exploits-87-percent-one-...

Key empirical evidence that frontier LLMs lower the barrier to cyberattacks; relevant to AI risk assessments, deployment policy debates, and discussions of capability thresholds for dangerous use.

Metadata

Importance: 72/100news articlenews

Summary

This IBM Think article summarizes University of Illinois research demonstrating that GPT-4 can autonomously exploit 87% of 'one-day' (recently disclosed but unpatched) cybersecurity vulnerabilities when given CVE descriptions. The finding highlights the dual-use risk of advanced LLMs as tools for automated cyberattacks, requiring only publicly available vulnerability information to achieve high exploitation rates.

Key Points

•GPT-4 successfully exploited 87% of one-day vulnerabilities in controlled tests when provided CVE descriptions.
•The research used real-world CVEs, showing LLMs can translate public vulnerability disclosures into working exploits.
•Weaker models (GPT-3.5, open-source LLMs) performed significantly worse, suggesting capability thresholds matter for cyber risk.
•Findings raise urgent questions about responsible disclosure timelines and LLM access controls in security contexts.
•Demonstrates that frontier AI models represent a qualitative leap in the accessibility of cyberattack capabilities.

Cited by 1 page

Page	Type	Quality
Cyberweapons Risk	Risk	91.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202611 KB

ChatGPT 4 can exploit 87% of one-day vulnerabilities | IBM 
 

 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 



 

 
 
 



 

 
 
 
 
 
 
 

 
 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 
 
 

 
 
 
 
 


 
 
 

 

 

 
 
 

 

 
 


 
 

 

 

 
 
 
 

 
 
 

 

 
 
 

 

 

 
 

 

 
 
 

 

 

 
 

 
 
 
 
 
 










 
 
 






 



 




 
 






 

 

 

 
 
 
 


 
 

 






 
 

 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 Security
 
 
 
 
 
 Artificial Intelligence
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 ChatGPT 4 can exploit 87% of one-day vulnerabilities

 

 

 

 
 
 
 

 
 
 
 
 

 

 
 
 

 
 
 

 
 

 

 
 
 

 

 
 
 

 

 
 
 
 
 
 
 
 
 

 
 

 

 
 

 
 
 
 
 
 
 
 

 
 
 
 
 
 

 
 
 

 
 
 
 Authors

 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Jennifer Gregory 
 
 Cybersecurity Writer

 

 

 

 
 

 
 

 
 

 
 
 
 
 

 
 
 

 

 

 
 
 Since the widespread and growing use of ChatGPT and other large language models (LLMs) in recent years, cybersecurity has been a top concern. Among the many questions, cybersecurity professionals wondered how effective these tools were in launching an attack. Cybersecurity researchers Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang recently performed a study to determine the answer. The conclusion: They are very effective.



 
 
 
 

 
 
 
 

 

 
 

 
 
 
 
 
 

 

 

 
 The latest tech news, backed by expert insights

 Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the  IBM Privacy Statement .


 
 
 
 Thank you! You are subscribed. 

 
 

 
 
 
 
 
 
 
 
 
 

 
 

 

 
 

 
 
 

 

 
 

 
 
 
 
 

 
 ChatGPT 4 quickly exploited one-day vulnerabilities
 

 

 

 
 
 During the study, the team used 15 one-day vulnerabilities that occurred in real life. One-day vulnerabilities refer to the time between when an issue is discovered and the patch is created, meaning it’s a known vulnerability. Cases included websites with vulnerabilities, container management software and Python packages. Because all the vulnerabilities came from the CVE database, they included the CVE description.


 The LLM agents also had web browsing elements, a terminal, search results, file creation and a code interpreter. Additionally, the researchers used a very detailed prompt with a total of 1,056 tokens and 91 lines of code. The prompt also included debugging and logging statements. The prompts did not, however, include sub-agents or a separate planning module.


 The team quickly learned that ChatGPT was able to correctly exploit one-day vulnerabilities 87% of the time. All the other methods tested, which included LLMs and open-sourc

... (truncated, 11 KB total)

Resource ID: 674736d5e6082df6 | Stable ID: sid_gyMnH88SmC