Longterm Wiki
Back

Russinovich et al., *The Crescendo Multi-Turn LLM Jailbreak Attack*, USENIX Security 2025 (https://usenix.org/system/...

web

Data Status

Not fetched

Cited by 1 page

PageTypeQuality
Alignment Robustness Trajectory ModelAnalysis64.0

Cached Content Preview

HTTP 200Fetched Feb 23, 202670 KB
Great, Now Write an Article About That:
The Crescendo Multi-Turn LLM Jailbreak Attack
Mark Russinovich
Microsoft Azure
Ahmed Salem
Microsoft
Ronen Eldan
Microsoft
Abstract
Large Language Models (LLMs) have risen significantly in
popularity and are increasingly being adopted across multiple
applications. These LLMs are heavily aligned to resist engag-
ing in illegal or unethical topics as a means to avoid contribut-
ing to responsible AI harms. However, a recent line of attacks,
known as “jailbreaks”, seek to overcome this alignment. Intu-
itively, jailbreak attacks aim to narrow the gap between what
the model can do and what it is willing to do. In this paper, we
introduce a novel jailbreak attack called Crescendo. Unlike
existing jailbreak methods, Crescendo is a simple multi-turn
jailbreak that interacts with the model in a seemingly benign
manner. It begins with a general prompt or question about
the task at hand and then gradually escalates the dialogue
by referencing the model’s replies progressively leading to a
successful jailbreak. We evaluate Crescendo on various pub-
lic systems, including ChatGPT, Gemini Pro, Gemini-Ultra,
LlaMA-2 70b and LlaMA-3 70b Chat, and Anthropic Chat.
Our results demonstrate the strong efficacy of Crescendo,
with it achieving high attack success rates across all evaluated
models and tasks. Furthermore, we present Crescendoma-
tion1, a tool that automates the Crescendo attack and demon-
strate its efficacy against state-of-the-art models through our
evaluations. Crescendomation surpasses other state-of-the-
art jailbreaking techniques on the AdvBench subset dataset,
achieving 29-61% higher performance on GPT-4 and 49-71%
on Gemini-Pro. Finally, we also demonstrate Crescendo’s
ability to jailbreak multimodal models.
Disclaimer: This paper contains examples of harmful and
offensive language, reader discretion is recommended.
1 Introduction
Recent advancements in large language models (LLMs) have
fueled their adoption into the products of numerous com-
1Crescendomation is available as part of PyRIT (https://github.com/
Azure/PyRIT) and is also accessible on Zenodo (https://zenodo.org/
records/14673783).
panies, including Microsoft, Google, and OpenAI. Concur-
rently, multiple research studies have been examining the
security [28, 36] and privacy risks [9, 17, 23, 29] associ-
ated with these LLMs. One of the most notable security
threats is the concept of “jailbreaks”. Most LLMs are safety-
aligned [8, 15, 19, 25], meaning they are trained to avoid per-
forming illegal or unethical tasks or generating harmful con-
tent in general. Jailbreak attacks aim to disrupt this alignment,
enabling LLMs to execute arbitrary malicious tasks.
There are various forms of jailbreaks. For instance,
optimization-based jailbreaks [21, 36], involve adversaries
optimizing a suffix to circumvent the model’s safety measures.
These methods mostly require white-box access to the target
LLMs, rendering them ineffective against black-box mode

... (truncated, 70 KB total)
Resource ID: 517e5dbf39886e3f | Stable ID: NjUyZDQ5OG