Skip to content
Longterm Wiki
Back

How hackers turned Claude Code into a cyber weapon

blog

Credibility Rating

2/5
Mixed(2)

Mixed quality. Some useful content but inconsistent editorial standards. Claims should be verified.

Rating inherited from publication venue: Substack

A concrete case study of real-world AI misuse for cyber operations, relevant to discussions of dual-use AI risks, jailbreaking, and the limitations of prompt-level safety guardrails in agentic coding systems.

Metadata

Importance: 72/100news articlenews

Summary

Anthropic disrupted a real-world cyber espionage campaign in September 2025 where attackers manipulated Claude to automate 80-90% of attacks against ~30 high-profile organizations by bypassing safety guardrails through task decomposition and false persona assignment. The case illustrates how AI systems can be weaponized through prompt manipulation even when safety measures exist, and underscores the dual-use risks of capable AI coding assistants.

Key Points

  • Attackers bypassed Claude's safety guardrails by decomposing complex attack chains into seemingly innocent subtasks and assigning Claude a fake 'cybersecurity employee' persona.
  • Claude was used to automate reconnaissance, vulnerability identification, exploit code writing, and data extraction, with human operators intervening only at critical decision points.
  • The campaign targeted ~30 high-profile organizations and achieved 80-90% automation of the attack pipeline.
  • Anthropic detected the activity in September 2025, banned associated accounts, and notably used Claude itself to analyze the investigation data.
  • The incident highlights the need for improved behavioral detection systems beyond input filtering, as capability-based misuse can evade prompt-level safeguards.

Cited by 1 page

Cached Content Preview

HTTP 200Fetched Apr 9, 202610 KB
How hackers turned Claude Code into a semi-autonomous cyber-weapon 
 
 
 
 
 

 

 

 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 

 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 
 
 

 
 
 
 
 

 

 
 
 

 

 

 

 

 
 

 
 

 

 

 

 
 TechTalks 

 Subscribe Sign in How hackers turned Claude Code into a semi-autonomous cyber-weapon

 By breaking down complex attacks into seemingly innocent steps, the hackers bypassed Claude's safety guardrails and unleashed an autonomous agent.

 Ben Dickson Nov 15, 2025 7 2 2 Share Anthropic recently announced it had disrupted the “first reported AI-orchestrated cyber espionage campaign,” a sophisticated operation where its own AI tool, Claude, was used to automate attacks. A group assessed by the company to be a Chinese state-sponsored actor manipulated the AI to target approximately 30 high-profile organizations, including large tech companies, financial institutions, and government agencies. 

 The operation, which succeeded in a small number of cases, automated 80-90% of the campaign, with a human operator intervening only at critical decision points. This can be a warning to how cyber warfare is evolving and accelerating (though there are clear limitations to what current AI systems can do).

 Anatomy of an AI-powered attack 

 The attackers did not need to perform a complex hack on Claude itself. Instead, they bypassed its safety guardrails through clever prompting. They broke their attack down into a series of small, seemingly benign technical tasks. By isolating each step, they prevented the AI from understanding the broader malicious context of its actions. The operators also assigned Claude a persona, convincing the model it was an employee of a legitimate cybersecurity firm conducting defensive penetration testing.

 TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

 Subscribe This approach allowed the attackers to build an autonomous framework where human operators would select a target, and the AI would execute a multi-stage attack. First, Claude performed reconnaissance, inspecting the target organization’s infrastructure to identify high-value databases. Next, it identified security vulnerabilities, researched exploitation techniques, and wrote its own code to harvest credentials. With access secured, the AI extracted and categorized large amounts of private data based on its intelligence value. In a final phase, it created comprehensive documentation of the stolen credentials and compromised systems to aid in future operations.

 At each stage, humans took care of directing the AI, verifying the results and steering it in the right direction. The AI did most of the brunt work, making thousands of requests, sometimes multiple per second, a spee

... (truncated, 10 KB total)
Resource ID: 81ef537dcc6747d2 | Stable ID: sid_w1TWyrvrgC