Longterm Wiki
Back

Adversarial Policies Beat Superhuman Go AIs | FAR.AI

web

Data Status

Not fetched

Cited by 1 page

PageTypeQuality
FAR AIOrganization76.0

Cached Content Preview

HTTP 200Fetched Feb 22, 20263 KB
Adversarial Policies Beat Superhuman Go AIs | FAR.AI 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 
 We updated our website and would love your feedback! 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Events 
 
 
 
 
 
 Events 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Programs 
 
 
 
 
 
 Programs 
 
 
 
 
 
 
 
 
 Blog 
 
 
 
 
 
 About 
 
 
 
 
 
 About 
 
 
 
 
 
 
 
 
 
 Careers Donate 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 All Research 
 / Robustness 
 
 
 Adversarial Policies Beat Superhuman Go AIs

 
 
 
 Full PDF 
 
 
 
 
 Project 
 
 
 
 
 
 
 Source 
 
 
 
 
 Blog 
 
 
 
 
 
 
 
 
 
 
 Citation 
 
 
 
 
 
 
 
 
 
 
 January 9, 2023

 
 
 
 
 
 Tony Wang 
 
 
 Adam Gleave 
 
 
 Tom Tseng 
 
 
 Kellin Pelrine 
 
 
 Nora Belrose 
 
 
 Joseph Miller 
 
 
 Michael D. Dennis 
 
 
 Yawen Duan 
 
 
 Viktor Pogrebniak 
 
 
 Sergey Levine 
 
 
 Stuart Russell 
 
 
 
 
 
 
 
 abstract

 
 
 
 
 We attack the state-of-the-art Go-playing AI system, KataGo, by training adversarial policies that play against frozen KataGo victims. Our attack achieves a >99% win rate when KataGo uses no tree-search, and a >77% win rate when KataGo uses enough search to be superhuman. Notably, our adversaries do not win by learning to play Go better than KataGo -- in fact, our adversaries are easily beaten by human amateurs. Instead, our adversaries win by tricking KataGo into making serious blunders. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available at [goattack.far.ai](goattack.far.ai).

 
 
 
 
 Share on: 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Research

 Our research explores a portfolio 
of high-potential agendas.

 
 
 
 
 
 
 
 
 Events

 Our events bring together 
global leaders in AI.

 
 
 
 
 
 
 
 
 Programs

 Our programs build the field of trustworthy and secure AI

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Subscribe 
 
 
 
 
 
 
 
 
 
 Subscribe to our newsletter 
 
 
 
 

 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 Organization About Team Programs News Search 
 
 
 Events All Events Alignment Workshops Specialized Workshops All Event Recordings 
 
 
 Research All Publications Research Overview 
 
 
 
 Robustness 
 
 
 Interpretability 
 
 
 Model Evaluation 
 
 
 Alignment 
 
 
 
 
 
 Get involved Careers Contact Donate Newsletter 
 
 
 
 
 Financial Reports / 990s Privacy Policy Terms of Service 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze web page traffic and improve our website. We do not and will never sell user data. Read more about our cookie policy on our privacy policy . Please contact us if you have any questions.

 
 
 
 © 2025 FAR AI, Inc. 
 Website by ODW 
 
 
 
 
 
 
 FAR.AI only uses cookies essential for website functionality and anonymous usage analytics. 
 
 I understand 
 
 

... (truncated, 3 KB total)
Resource ID: 90d7373e3af4f090 | Stable ID: YjA3NDE4N2