Skip to content
Longterm Wiki
publication

Universal and Transferable Adversarial Attacks on Aligned Language Models

Metadata

Source Tablepublications
Source IDxFD4v0FaVJ
DescriptionAndy Zou, Zifan Wang, Nicholas Carlini et al., 2023
Source URLllm-attacks.org/
ParentCenter for AI Safety (CAIS)
Children
CreatedMar 23, 2026, 2:46 PM
UpdatedMar 23, 2026, 2:46 PM
SyncedMar 23, 2026, 2:46 PM

Record Data

idxFD4v0FaVJ
entityIdCenter for AI Safety (CAIS)(organization)
entityDisplayName
resourceId
titleUniversal and Transferable Adversarial Attacks on Aligned Language Models
authorsAndy Zou, Zifan Wang, Nicholas Carlini et al.
urlllm-attacks.org/
venue
publishedDate2023
publicationTypepaper
citationCount
isFlagshipYes
abstract
sourcellm-attacks.org/
notesHighly influential jailbreaking paper

Source Check Verdicts

confirmed95% confidence

Last checked: 4/3/2026

The source text confirms all key fields in the record. The title matches exactly. The authors listed (Andy Zou, Zifan Wang, Nicholas Carlini et al.) are confirmed—the source shows these three plus three additional authors (Milad Nasr, J. Zico Kolter, Matt Fredrikson), so the 'et al.' notation is appropriate and accurate. The publication year 2023 is confirmed by the arxiv link (2307.15043, which is July 2023). The URL https://llm-attacks.org/ is explicitly shown as the website hosting this research. The publication type as 'paper' is confirmed by the explicit '[Paper]' link to arxiv.org/abs/2307.15043. All fields are directly supported by the source text.

Debug info

Thing ID: xFD4v0FaVJ

Source Table: publications

Source ID: xFD4v0FaVJ