Competition-level code generation with AlphaCode

paper

2022·arXiv·arxiv.org/abs/2203.07814

Authors

Yujia Li·David Choi·Junyoung Chung·Nate Kushman·Julian Schrittwieser·Rémi Leblond·Tom Eccles·James Keeling·Felix Gimeno·Agustin Dal Lago·Thomas Hubert·Peter Choy·Cyprien de Masson d'Autume·Igor Babuschkin·Xinyun Chen·Po-Sen Huang·Johannes Welbl·Sven Gowal·Alexey Cherepanov·James Molloy·Daniel J. Mankowitz·Esme Sutherland Robson·Pushmeet Kohli·Nando de Freitas·Koray Kavukcuoglu·Oriol Vinyals

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

A landmark DeepMind paper demonstrating that large language models can solve competitive programming problems requiring non-trivial algorithmic reasoning, relevant to tracking frontier AI capabilities in code generation and automated software development.

Paper Details

Citations

662

176 influential

Year

2022

Methodology

peer-reviewed

Metadata

Importance: 72/100arxiv preprintprimary source

Abstract

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.

Summary

AlphaCode is DeepMind's system for generating solutions to competitive programming problems requiring deep algorithmic reasoning, achieving an average ranking in the top 54.3% on Codeforces competitions with 5,000+ participants. Success depends on a high-quality training dataset, large transformer architectures, and a large-scale sampling-and-filtering approach that generates many candidate solutions and selects the best based on program behavior.

Key Points

•Achieves top 54.3% average ranking on Codeforces competitive programming contests, a significant milestone for AI code generation on complex reasoning tasks.
•Uses large-scale sampling (generating millions of candidates) followed by filtering based on test case behavior to reduce submissions to a tractable set.
•Training on a carefully curated, high-quality competitive programming dataset was critical; data quality mattered as much as model scale.
•Demonstrates that transformer-based LLMs can go beyond simple instruction-to-code translation to perform genuine algorithmic problem-solving.
•Raises AI safety-relevant questions about the pace of capability gains in code generation and potential for automated software development at scale.

Cited by 1 page

Page	Type	Quality
Autonomous Coding	Capability	63.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202698 KB

[2203.07814] Competition-Level Code Generation with AlphaCode 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 
 \pdftrailerid 
 redacted
 \svgsetup inkscapelatex=false

 \correspondingauthor yujiali@deepmind.com, davidhchoi@deepmind.com, vinyals@deepmind.com

 * * affiliationtext: Joint first authors 

 
 Competition-Level Code Generation with AlphaCode

 
 
 Yujia Li
 
 
 David Choi
 
 
 Junyoung Chung
 
 
 Nate Kushman
 
 
 Julian Schrittwieser
 
 
 Rémi Leblond
 
 
 Tom Eccles
 
 
 James Keeling
 
 
 Felix Gimeno
 
 
 Agustin Dal Lago
 
 
 Thomas Hubert
 
 
 Peter Choy
 
 
 Cyprien de Masson d’Autume
 
 
 Igor Babuschkin
 
 
 Xinyun Chen
 
 
 Po-Sen Huang
 
 
 Johannes Welbl
 
 
 Sven Gowal
 
 
 Alexey Cherepanov
 
 
 James Molloy
 
 
 Daniel J. Mankowitz
 
 
 Esme Sutherland Robson
 
 
 Pushmeet Kohli
 
 
 Nando de Freitas
 
 
 Koray Kavukcuoglu
 
 
 Oriol Vinyals
 
 

 
 Abstract

 Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging.
Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code.
For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging.
To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants.
We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.

 
 
 

 
 
 \etocsettocstyle \etocsetnexttocdepth 
 subsection
 \localtableofcontents 

 
 
 
 
 1 Introduction

 
 Computer programming has emerged as a general-purpose problem-solving tool throughout science, industry, and daily life. As part of this growth, there has been continuously increasing demand for tools that can make programmers more productive  (Matsakis and Klock, 2014 ) , or make programming and programming education more accessible  (Resnick et al., 2009 ) . Developing AI systems that can effectively model and understand code can transform these tools and the way we interact with them. Systems that 

... (truncated, 98 KB total)

Resource ID: 2137eaa69f74f139 | Stable ID: ZTg4MmFjOT