Universal and Transferable Adversarial Attacks on Aligned Language Models
Metadata
| Source Table | publications |
| Source ID | xFD4v0FaVJ |
| Description | Andy Zou, Zifan Wang, Nicholas Carlini et al., 2023 |
| Source URL | llm-attacks.org/ |
| Parent | Center for AI Safety (CAIS) |
| Children | — |
| Created | Mar 23, 2026, 2:46 PM |
| Updated | Mar 23, 2026, 2:46 PM |
| Synced | Mar 23, 2026, 2:46 PM |
Record Data
id | xFD4v0FaVJ |
entityId | Center for AI Safety (CAIS)(organization) |
entityDisplayName | — |
resourceId | — |
title | Universal and Transferable Adversarial Attacks on Aligned Language Models |
authors | Andy Zou, Zifan Wang, Nicholas Carlini et al. |
url | llm-attacks.org/ |
venue | — |
publishedDate | 2023 |
publicationType | paper |
citationCount | — |
isFlagship | Yes |
abstract | — |
source | llm-attacks.org/ |
notes | Highly influential jailbreaking paper |
Source Check Verdicts
Last checked: 4/3/2026
The source text confirms all key fields in the record. The title matches exactly. The authors listed (Andy Zou, Zifan Wang, Nicholas Carlini et al.) are confirmed—the source shows these three plus three additional authors (Milad Nasr, J. Zico Kolter, Matt Fredrikson), so the 'et al.' notation is appropriate and accurate. The publication year 2023 is confirmed by the arxiv link (2307.15043, which is July 2023). The URL https://llm-attacks.org/ is explicitly shown as the website hosting this research. The publication type as 'paper' is confirmed by the explicit '[Paper]' link to arxiv.org/abs/2307.15043. All fields are directly supported by the source text.
Debug info
Thing ID: xFD4v0FaVJ
Source Table: publications
Source ID: xFD4v0FaVJ