AI Safety Gridworlds
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Data Status
Abstract
We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Google DeepMind | Organization | 37.0 |
Cached Content Preview
# AI Safety Gridworlds
Jan Leike
DeepMind
&Miljan Martic
DeepMind
&Victoria Krakovna
DeepMind
&Pedro A. Ortega
DeepMind
&Tom Everitt
DeepMind
Australian National University
&Andrew Lefrancq
DeepMind
&Laurent Orseau
DeepMind
&Shane Legg
DeepMind
###### Abstract
We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.
These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries.
To measure compliance with the intended safe behavior,
we equip each environment with a _performance function_ that is hidden from the agent.
This allows us to categorize AI safety problems into _robustness_ and _specification_ problems,
depending on whether the performance function corresponds to the observed reward function.
We evaluate A2C and Rainbow, two recent deep reinforcement learning agents,
on our environments and show that they are not able to solve them satisfactorily.
## 1 Introduction
Expecting that more advanced versions of today’s AI systems are going to be deployed in real-world applications,
numerous public figures have advocated more research into the safety of these systems (Bostrom, [2014](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib16 ""); Hawking et al., [2014](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib41 ""); Russell, [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib68 "")).
This nascent field of _AI safety_ still lacks a general consensus on its research problems,
and there have been several recent efforts to turn these concerns into technical problems on which we can make direct progress (Soares and Fallenstein, [2014](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib75 ""); Russell et al., [2015](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib69 ""); Taylor et al., [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib82 ""); Amodei et al., [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib5 "")).
Empirical research in machine learning has often been accelerated by the availability of the right data set.
MNIST (LeCun, [1998](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib49 "")) and ImageNet (Deng et al., [2009](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib24 "")) have had a large impact on the progress on supervised learning.
Scalable reinforcement learning research has been spurred by environment suites such as
the Arcade Learning Environment (Bellemare et al., [2013](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib12 "")),
OpenAI Gym (Brockman et al., [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib17 "")), DeepMind Lab (Beattie et al., [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib11 "")), and others.
However, to this date there has not yet been a comprehensive environment suite for AI safety problems.
With this paper, we aim to lay the
... (truncated, 89 KB total)84527d3e1671495f | Stable ID: NDY5OGY3NG