AI Safety Gridworlds

paper

2017·arXiv·arxiv.org/abs/1711.09883

Authors

Jan Leike·Miljan Martic·Victoria Krakovna·Pedro A. Ortega·Tom Everitt·Andrew Lefrancq·Laurent Orseau·Shane Legg

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Data Status

Not fetched

Abstract

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.

Cited by 1 page

Page	Type	Quality
Google DeepMind	Organization	37.0

Cached Content Preview

HTTP 200Fetched Feb 23, 202689 KB

# AI Safety Gridworlds

Jan Leike

DeepMind
&Miljan Martic

DeepMind
&Victoria Krakovna

DeepMind
&Pedro A. Ortega

DeepMind
&Tom Everitt

DeepMind

Australian National University
&Andrew Lefrancq

DeepMind
&Laurent Orseau

DeepMind
&Shane Legg

DeepMind

###### Abstract

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.
These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries.
To measure compliance with the intended safe behavior,
we equip each environment with a _performance function_ that is hidden from the agent.
This allows us to categorize AI safety problems into _robustness_ and _specification_ problems,
depending on whether the performance function corresponds to the observed reward function.
We evaluate A2C and Rainbow, two recent deep reinforcement learning agents,
on our environments and show that they are not able to solve them satisfactorily.

## 1 Introduction

Expecting that more advanced versions of today’s AI systems are going to be deployed in real-world applications,
numerous public figures have advocated more research into the safety of these systems (Bostrom, [2014](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib16 ""); Hawking et al., [2014](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib41 ""); Russell, [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib68 "")).
This nascent field of _AI safety_ still lacks a general consensus on its research problems,
and there have been several recent efforts to turn these concerns into technical problems on which we can make direct progress (Soares and Fallenstein, [2014](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib75 ""); Russell et al., [2015](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib69 ""); Taylor et al., [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib82 ""); Amodei et al., [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib5 "")).

Empirical research in machine learning has often been accelerated by the availability of the right data set.
MNIST (LeCun, [1998](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib49 "")) and ImageNet (Deng et al., [2009](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib24 "")) have had a large impact on the progress on supervised learning.
Scalable reinforcement learning research has been spurred by environment suites such as
the Arcade Learning Environment (Bellemare et al., [2013](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib12 "")),
OpenAI Gym (Brockman et al., [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib17 "")), DeepMind Lab (Beattie et al., [2016](https://ar5iv.labs.arxiv.org/html/1711.09883#bib.bib11 "")), and others.
However, to this date there has not yet been a comprehensive environment suite for AI safety problems.

With this paper, we aim to lay the 

... (truncated, 89 KB total)

Resource ID: 84527d3e1671495f | Stable ID: NDY5OGY3NG