The Crescendo Multi-Turn LLM Jailbreak Attack

web

usenix.org·usenix.org/system/files/conference/usenixsecurity25/sec25...

A peer-reviewed USENIX Security 2025 paper from Microsoft researchers introducing a highly effective multi-turn jailbreak technique relevant to understanding LLM safety alignment failures and informing robust AI deployment practices.

Metadata

Importance: 72/100conference paperprimary source

Summary

Crescendo is a novel multi-turn jailbreak attack that gradually escalates benign-seeming conversations to bypass LLM safety alignment. By referencing the model's own prior replies and incrementally steering dialogue toward harmful content, it achieves high attack success rates across GPT-4, Gemini, LLaMA-2/3, and Anthropic models. An automated version, Crescendomation, outperforms state-of-the-art jailbreak techniques by 29-71% on benchmark datasets.

Key Points

•Crescendo exploits LLMs' tendency to follow conversation patterns by gradually escalating seemingly benign multi-turn dialogues into jailbreaks.
•Unlike one-shot jailbreaks, Crescendo's inputs lack identifiable malicious content, making it harder for input filters to detect.
•Crescendomation, an automated tool built on PyRIT, surpasses other jailbreak techniques by 29-61% on GPT-4 and 49-71% on Gemini-Pro.
•The attack works in black-box settings and has been demonstrated effective against multimodal LLMs as well.
•Published by Microsoft researchers at USENIX Security 2025, representing a significant advance in adversarial AI safety research.

Cited by 1 page

Page	Type	Quality
Alignment Robustness Trajectory Model	Analysis	64.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202670 KB

Great, Now Write an Article About That:
The Crescendo Multi-Turn LLM Jailbreak Attack
Mark Russinovich
Microsoft Azure
Ahmed Salem
Microsoft
Ronen Eldan
Microsoft
Abstract
Large Language Models (LLMs) have risen significantly in
popularity and are increasingly being adopted across multiple
applications. These LLMs are heavily aligned to resist engag-
ing in illegal or unethical topics as a means to avoid contribut-
ing to responsible AI harms. However, a recent line of attacks,
known as “jailbreaks”, seek to overcome this alignment. Intu-
itively, jailbreak attacks aim to narrow the gap between what
the model can do and what it is willing to do. In this paper, we
introduce a novel jailbreak attack called Crescendo. Unlike
existing jailbreak methods, Crescendo is a simple multi-turn
jailbreak that interacts with the model in a seemingly benign
manner. It begins with a general prompt or question about
the task at hand and then gradually escalates the dialogue
by referencing the model’s replies progressively leading to a
successful jailbreak. We evaluate Crescendo on various pub-
lic systems, including ChatGPT, Gemini Pro, Gemini-Ultra,
LlaMA-2 70b and LlaMA-3 70b Chat, and Anthropic Chat.
Our results demonstrate the strong efficacy of Crescendo,
with it achieving high attack success rates across all evaluated
models and tasks. Furthermore, we present Crescendoma-
tion1, a tool that automates the Crescendo attack and demon-
strate its efficacy against state-of-the-art models through our
evaluations. Crescendomation surpasses other state-of-the-
art jailbreaking techniques on the AdvBench subset dataset,
achieving 29-61% higher performance on GPT-4 and 49-71%
on Gemini-Pro. Finally, we also demonstrate Crescendo’s
ability to jailbreak multimodal models.
Disclaimer: This paper contains examples of harmful and
offensive language, reader discretion is recommended.
1 Introduction
Recent advancements in large language models (LLMs) have
fueled their adoption into the products of numerous com-
1Crescendomation is available as part of PyRIT (https://github.com/
Azure/PyRIT) and is also accessible on Zenodo (https://zenodo.org/
records/14673783).
panies, including Microsoft, Google, and OpenAI. Concur-
rently, multiple research studies have been examining the
security [28, 36] and privacy risks [9, 17, 23, 29] associ-
ated with these LLMs. One of the most notable security
threats is the concept of “jailbreaks”. Most LLMs are safety-
aligned [8, 15, 19, 25], meaning they are trained to avoid per-
forming illegal or unethical tasks or generating harmful con-
tent in general. Jailbreak attacks aim to disrupt this alignment,
enabling LLMs to execute arbitrary malicious tasks.
There are various forms of jailbreaks. For instance,
optimization-based jailbreaks [21, 36], involve adversaries
optimizing a suffix to circumvent the model’s safety measures.
These methods mostly require white-box access to the target
LLMs, rendering them ineffective against black-box mode

... (truncated, 70 KB total)

Resource ID: 517e5dbf39886e3f | Stable ID: sid_DbcYbS2t6o