Concrete Problems in AI Safety
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Data Status
Abstract
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.
Cited by 8 pages
| Page | Type | Quality |
|---|---|---|
| Long-Horizon Autonomous Tasks | Capability | 65.0 |
| Deep Learning Revolution Era | Historical | 44.0 |
| AI Compounding Risks Analysis Model | Analysis | 60.0 |
| Reward Hacking Taxonomy and Severity Model | Analysis | 71.0 |
| Safety-Capability Tradeoff Model | Analysis | 64.0 |
| AI Safety Research Value Model | Analysis | 60.0 |
| AI Alignment | Approach | 91.0 |
| AI Doomer Worldview | Concept | 38.0 |
Cached Content Preview
[1606.06565] Concrete Problems in AI Safety
Concrete Problems in AI Safety
Dario Amodei
Google Brain
These authors contributed equally.
Chris Olah † † footnotemark:
Google Brain
Jacob Steinhardt
Stanford University
Paul Christiano
UC Berkeley
John Schulman
OpenAI
Dan Mané
Google Brain
Abstract
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function (“avoiding side effects” and “avoiding reward hacking”), an objective function that is too expensive to evaluate frequently (“scalable supervision”), or undesirable behavior during the learning process (“safe exploration” and “distributional shift”). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.
1 Introduction
The last few years have seen rapid progress on long-standing, difficult problems in machine learning and artificial intelligence (AI), in areas as diverse as computer vision [ 82 ] , video game playing [ 102 ] , autonomous vehicles [ 86 ] , and Go [ 140 ] . These advances have brought excitement about the positive potential for AI to transform medicine [ 126 ] , science [ 59 ] , and transportation [ 86 ] , along with concerns about the privacy [ 76 ] , security [ 115 ] , fairness [ 3 ] , economic [ 32 ] , and military [ 16 ] implications of autonomous systems, as well as concerns about the longer-term implications of powerful AI [ 27 , 167 ] .
The authors believe that AI technologies are likely to be overwhelmingly beneficial for humanity, but we also believe that it is worth giving serious thought to potential challenges and risks. We strongly support work on privacy, security, fairness, economics, and policy, but in this document we discuss another class of problem which we believe is also relevant to the societal impacts of AI: the problem of accidents in machine learning systems. We define accidents as unintended and harmful behavior that may emerge from machine learning systems when we specify the wrong objective function, are not careful about the learning process, or commit other machine learning-related implementation errors.
There is a large and diverse literature in the machine learning community on issues related to accidents
... (truncated, 98 KB total)cd3035dbef6c7b5b | Stable ID: M2JlZDZhMj