Improving Alignment and Robustness with Circuit Breakers
All key fields in the record are confirmed by the source text: (1) Title matches exactly; (2) Authors Andy Zou, Long Phan, and Justin Wang are confirmed as the first three authors (et al. appropriately represents the remaining 7 authors); (3) Published date 2024 is confirmed (submitted June 6, 2024); (4) URL https://arxiv.org/abs/2406.04313 matches the arXiv identifier 2406.04313 provided in the source; (5) Publication type is confirmed as an arXiv paper in Machine Learning (cs.LG). No contradictions detected.
Our claim
entire record- Title
- Improving Alignment and Robustness with Circuit Breakers
- Authors
- Andy Zou, Long Phan, Justin Wang et al.
- Published Date
- 2024
- Publication Type
- paper
- Is Flagship
- No
- Notes
- ICML 2024
Source evidence
1 src · 1 checkNoteAll key fields in the record are confirmed by the source text: (1) Title matches exactly; (2) Authors Andy Zou, Long Phan, and Justin Wang are confirmed as the first three authors (et al. appropriately represents the remaining 7 authors); (3) Published date 2024 is confirmed (submitted June 6, 2024); (4) URL https://arxiv.org/abs/2406.04313 matches the arXiv identifier 2406.04313 provided in the source; (5) Publication type is confirmed as an arXiv paper in Machine Learning (cs.LG). No contradictions detected.