Longterm Wiki
Back

Concrete Problems in AI Safety

paper

Authors

Dario Amodei·Chris Olah·Jacob Steinhardt·Paul Christiano·John Schulman·Dan Mané

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Data Status

Not fetched

Abstract

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.

Cited by 8 pages

Cached Content Preview

HTTP 200Fetched Feb 23, 202698 KB
[1606.06565] Concrete Problems in AI Safety 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 
   
 Concrete Problems in AI Safety 
   

 
 
 Dario Amodei
 Google Brain
  
 
 These authors contributed equally. 
    
 Chris Olah † † footnotemark: 
 Google Brain
 
 
    
 Jacob Steinhardt
 Stanford University
 
 
    
 Paul Christiano
 UC Berkeley
 
 
    
 John Schulman
 OpenAI
 
 
    
 Dan Mané
 Google Brain
 
 
 

 
 Abstract

 Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function (“avoiding side effects” and “avoiding reward hacking”), an objective function that is too expensive to evaluate frequently (“scalable supervision”), or undesirable behavior during the learning process (“safe exploration” and “distributional shift”). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.

 
 
 
 1 Introduction

 
 The last few years have seen rapid progress on long-standing, difficult problems in machine learning and artificial intelligence (AI), in areas as diverse as computer vision  [ 82 ] , video game playing  [ 102 ] , autonomous vehicles  [ 86 ] , and Go  [ 140 ] . These advances have brought excitement about the positive potential for AI to transform medicine  [ 126 ] , science  [ 59 ] , and transportation  [ 86 ] , along with concerns about the privacy  [ 76 ] , security  [ 115 ] , fairness  [ 3 ] , economic  [ 32 ] , and military [ 16 ] implications of autonomous systems, as well as concerns about the longer-term implications of powerful AI  [ 27 , 167 ] .

 
 
 The authors believe that AI technologies are likely to be overwhelmingly beneficial for humanity, but we also believe that it is worth giving serious thought to potential challenges and risks. We strongly support work on privacy, security, fairness, economics, and policy, but in this document we discuss another class of problem which we believe is also relevant to the societal impacts of AI: the problem of accidents in machine learning systems. We define accidents as unintended and harmful behavior that may emerge from machine learning systems when we specify the wrong objective function, are not careful about the learning process, or commit other machine learning-related implementation errors.

 
 
 There is a large and diverse literature in the machine learning community on issues related to accidents

... (truncated, 98 KB total)
Resource ID: cd3035dbef6c7b5b | Stable ID: M2JlZDZhMj