[1912.06680] Dota 2 with Large Scale Deep Reinforcement Learning

paper

2021·arXiv·arxiv.org/abs/1912.06680

Authors

Nima Sarang·Charalambos Poullis

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

A landmark capabilities paper relevant to AI safety discussions about the rapid progression of RL systems, the role of scale in AI performance, and the challenges of aligning highly capable agents trained via self-play in complex environments.

Paper Details

Citations

2,088

97 influential

Year

2019

arXiv:1912.06680 Semantic Scholar

Metadata

Importance: 62/100arxiv preprintprimary source

Summary

Describes OpenAI Five, a deep reinforcement learning system that achieved superhuman performance in the complex real-time strategy game Dota 2, defeating the world champion team. The paper details the training infrastructure, algorithmic choices, and scaling laws that enabled this milestone, using roughly 45,000 years of self-play experience. It serves as a landmark demonstration of what large-scale RL can achieve in long-horizon, partially observable, multi-agent environments.

Key Points

•OpenAI Five defeated the world champion Dota 2 team OG in a best-of-three series, demonstrating superhuman performance in a complex multi-agent game.
•Training required approximately 45,000 years of self-play experience accumulated through massive distributed compute infrastructure.
•The system uses a relatively simple LSTM-based architecture, suggesting scale and training time matter more than architectural complexity for such tasks.
•Demonstrates emergent teamwork and coordination among five independent agents without explicit communication mechanisms.
•Provides empirical evidence for scaling laws in RL: performance improved predictably with more compute, offering lessons for future large-scale AI systems.

Cited by 1 page

Page	Type	Quality
Deep Learning Revolution Era	Historical	44.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202698 KB

[1912.06680] Dota 2 with Large Scale Deep Reinforcement Learning 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 
 \doparttoc \faketableofcontents 
 
 Dota 2 with Large Scale Deep Reinforcement Learning

 
 
 

 
 OpenAI ,
 1 1 1 
Authors listed alphabetically. Please cite as
OpenAI et al., and use the following bibtex for citation: https://openai.com/bibtex/openai2019dota.bib 
 
 Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław “Psyho" Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang 
 
 
 

 
 Abstract

 On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game.
The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems.
OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds.
We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months.
By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

 
 
 
 1 Introduction

 
 The long-term goal of artificial intelligence is to solve advanced real-world challenges.
Games have served as stepping stones along this path for decades, from Backgammon (1992) to Chess (1997) to Atari (2013) [ 1 , 2 , 3 ] .
In 2016, AlphaGo defeated the world champion at Go using deep reinforcement learning and Monte Carlo tree search [ 4 ] .
In recent years, reinforcement learning (RL) models have tackled tasks as varied as robotic manipulation [ 5 ] , text summarization [ 6 ] , and video games such as Starcraft [ 7 ] and Minecraft [ 8 ] .

 
 
 Relative to previous AI milestones like Chess or Go, complex video games start to capture the complexity and continuous nature of the real world.
Dota 2 is a multiplayer real-time strategy game produced by Valve Corporation in 2013, which averaged between 500,000 and 1,000,000 concurrent players between 2013 and 2019.
The game is actively played by full time professionals; the prize pool for the 2019 international championship exceeded $35 million (the largest of any esports game in the world) [ 9 , 10 ] .
The game presents challenges for reinforcement learning due to
long time horizons, partial observability, and high dimensionality of observation and action spaces.
Dota 2’s rules are also complex — the game has been actively developed for over a decade, with game logic implemented in hundreds of thousa

... (truncated, 98 KB total)

Resource ID: 033ae49bb240e454 | Stable ID: sid_e2v3RwOarB