Back
Research from the University of Illinois
webKey empirical evidence that frontier LLMs lower the barrier to cyberattacks; relevant to AI risk assessments, deployment policy debates, and discussions of capability thresholds for dangerous use.
Metadata
Importance: 72/100news articlenews
Summary
This IBM Think article summarizes University of Illinois research demonstrating that GPT-4 can autonomously exploit 87% of 'one-day' (recently disclosed but unpatched) cybersecurity vulnerabilities when given CVE descriptions. The finding highlights the dual-use risk of advanced LLMs as tools for automated cyberattacks, requiring only publicly available vulnerability information to achieve high exploitation rates.
Key Points
- •GPT-4 successfully exploited 87% of one-day vulnerabilities in controlled tests when provided CVE descriptions.
- •The research used real-world CVEs, showing LLMs can translate public vulnerability disclosures into working exploits.
- •Weaker models (GPT-3.5, open-source LLMs) performed significantly worse, suggesting capability thresholds matter for cyber risk.
- •Findings raise urgent questions about responsible disclosure timelines and LLM access controls in security contexts.
- •Demonstrates that frontier AI models represent a qualitative leap in the accessibility of cyberattack capabilities.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Cyberweapons Risk | Risk | 91.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 202611 KB
ChatGPT 4 can exploit 87% of one-day vulnerabilities | IBM
Security
Artificial Intelligence
ChatGPT 4 can exploit 87% of one-day vulnerabilities
Authors
Jennifer Gregory
Cybersecurity Writer
Since the widespread and growing use of ChatGPT and other large language models (LLMs) in recent years, cybersecurity has been a top concern. Among the many questions, cybersecurity professionals wondered how effective these tools were in launching an attack. Cybersecurity researchers Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang recently performed a study to determine the answer. The conclusion: They are very effective.
The latest tech news, backed by expert insights
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement .
Thank you! You are subscribed.
ChatGPT 4 quickly exploited one-day vulnerabilities
During the study, the team used 15 one-day vulnerabilities that occurred in real life. One-day vulnerabilities refer to the time between when an issue is discovered and the patch is created, meaning it’s a known vulnerability. Cases included websites with vulnerabilities, container management software and Python packages. Because all the vulnerabilities came from the CVE database, they included the CVE description.
The LLM agents also had web browsing elements, a terminal, search results, file creation and a code interpreter. Additionally, the researchers used a very detailed prompt with a total of 1,056 tokens and 91 lines of code. The prompt also included debugging and logging statements. The prompts did not, however, include sub-agents or a separate planning module.
The team quickly learned that ChatGPT was able to correctly exploit one-day vulnerabilities 87% of the time. All the other methods tested, which included LLMs and open-sourc
... (truncated, 11 KB total)Resource ID:
674736d5e6082df6 | Stable ID: sid_gyMnH88SmC