AI Safety Gridworlds

web

GitHub·github.com/deepmind/ai-safety-gridworlds

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

A foundational DeepMind benchmark suite (2017) for evaluating RL agent safety properties; archived in 2023 but remains a standard reference for alignment researchers studying concrete safety failure modes in toy environments.

Metadata

Importance: 72/100tool pagetool

Summary

AI Safety Gridworlds is a suite of reinforcement learning environments from DeepMind designed to test and evaluate AI safety properties such as safe interruptibility, avoiding side effects, reward hacking, and distributional shift. Each gridworld scenario isolates a specific safety challenge, providing a standardized benchmark for safety research. The repository is now archived but remains a widely-cited foundational resource in the AI safety literature.

Key Points

•Provides a collection of toy RL environments, each targeting a distinct AI safety problem (e.g., safe interruptibility, side-effect avoidance, reward gaming).
•Includes a 'performance' vs. 'safety' reward distinction, allowing evaluation of agents on both task completion and safety criteria separately.
•Accompanied by the paper 'AI Safety Gridworlds' (Leike et al., 2017), which formalizes several key safety desiderata for RL agents.
•Archived in 2023 but still widely used as a benchmark and reference point in AI safety evaluation research.
•Supports reproducible, minimal environments that make it easier to isolate and study individual alignment failure modes.

Cited by 1 page

Page	Type	Quality
AI Knowledge Monopoly	Risk	50.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20268 KB

GitHub - google-deepmind/ai-safety-gridworlds: This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. · GitHub 

 
 
 
 

 
 

 

 

 
 

 
 

 

 

 

 

 

 

 

 

 

 

 
 
 

 
 
 

 

 

 
 
 
 

 

 

 

 

 

 

 
 

 

 

 
 

 
 
 

 
 

 

 

 
 
 
 

 
 Skip to content 

 
 
 
 
 
 

 
 
 
 
 

 

 

 

 
 
 
 
 
 You signed in with another tab or window. Reload to refresh your session. 
 You signed out in another tab or window. Reload to refresh your session. 
 You switched accounts on another tab or window. Reload to refresh your session. 

 
 
 
 Dismiss alert 

 
 
 

 

 

 
 
 
 
 
 
 
 
 
 
 
 {{ message }} 

 
 
 
 
 

 

 
 
 
 
 
 

 

 
 This repository was archived by the owner on Jul 21, 2023. It is now read-only.
 

 

 

 

 
 
 
 
 
 
 
 
 
 google-deepmind
 
 / 
 
 ai-safety-gridworlds 
 

 Public archive 
 

 

 
 
 
 

 
 
 
 Notifications
 You must be signed in to change notification settings 

 

 
 
 
 Fork
 125 
 
 

 
 
 
 
 
 Star
 631 
 
 

 

 
 

 
 

 

 
 

 
 
 

 
 
 

 
 
 
   master Branches Tags Go to file Code Open more actions menu Folders and files

 Name Name Last commit message Last commit date Latest commit

   History

 20 Commits 20 Commits ai_safety_gridworlds ai_safety_gridworlds     .gitignore .gitignore     AUTHORS AUTHORS     CHANGES.md CHANGES.md     CONTRIBUTING.md CONTRIBUTING.md     LICENSE LICENSE     README.md README.md     View all files Repository files navigation

 AI safety gridworlds

 
 This is a suite of reinforcement learning environments illustrating various
safety properties of intelligent agents. These environments are
implemented in pycolab , a
highly-customisable gridworld game engine with some batteries included.

 For more information, see the accompanying research
paper .

 For the latest list of changes, see CHANGES.md .

 Instructions

 
 
 Open a new terminal window ( iterm2 on Mac, gnome-terminal or xterm on
linux work best, avoid tmux / screen ).

 Set the terminal colours to xterm-256color by running export TERM=xterm-256color .

 Clone the repository using
 git clone https://github.com/deepmind/ai-safety-gridworlds.git .

 Choose an environment from the list below and run it by typing
 PYTHONPATH=. python -B ai_safety_gridworlds/environments/ENVIRONMENT_NAME.py .

 
 Dependencies

 
 
 Python 2 (with enum34 support) or Python 3. We tested it with all the commonly used Python minor versions (2.7, 3.4, 3.5, 3.6). Note that the version 2.7.15 might have curses rendering issues in a terminal.

 Pycolab which is the gridworlds game engine we use.

 Numpy. Our version is 1.14.5. Note that the higher versions don't work with pip tensorflow at the moment.

 Abseil Python common libraries.

 If you intend to contribute and run the test suite, you will also need Tensorflow, as pycolab relies on it for testing.

 
 We also recommend using a virtual environment. Under the assumption that you have the virtualen

... (truncated, 8 KB total)

Resource ID: 64f41b0780d481a9 | Stable ID: sid_2x0U5UgFEL