AI Safety Gridworlds
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: GitHub
A foundational DeepMind benchmark suite (2017) for evaluating RL agent safety properties; archived in 2023 but remains a standard reference for alignment researchers studying concrete safety failure modes in toy environments.
Metadata
Summary
AI Safety Gridworlds is a suite of reinforcement learning environments from DeepMind designed to test and evaluate AI safety properties such as safe interruptibility, avoiding side effects, reward hacking, and distributional shift. Each gridworld scenario isolates a specific safety challenge, providing a standardized benchmark for safety research. The repository is now archived but remains a widely-cited foundational resource in the AI safety literature.
Key Points
- •Provides a collection of toy RL environments, each targeting a distinct AI safety problem (e.g., safe interruptibility, side-effect avoidance, reward gaming).
- •Includes a 'performance' vs. 'safety' reward distinction, allowing evaluation of agents on both task completion and safety criteria separately.
- •Accompanied by the paper 'AI Safety Gridworlds' (Leike et al., 2017), which formalizes several key safety desiderata for RL agents.
- •Archived in 2023 but still widely used as a benchmark and reference point in AI safety evaluation research.
- •Supports reproducible, minimal environments that make it easier to isolate and study individual alignment failure modes.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Knowledge Monopoly | Risk | 50.0 |
Cached Content Preview
GitHub - google-deepmind/ai-safety-gridworlds: This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. · GitHub
Skip to content
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 21, 2023. It is now read-only.
google-deepmind
/
ai-safety-gridworlds
Public archive
Notifications
You must be signed in to change notification settings
Fork
125
Star
631
master Branches Tags Go to file Code Open more actions menu Folders and files
Name Name Last commit message Last commit date Latest commit
History
20 Commits 20 Commits ai_safety_gridworlds ai_safety_gridworlds .gitignore .gitignore AUTHORS AUTHORS CHANGES.md CHANGES.md CONTRIBUTING.md CONTRIBUTING.md LICENSE LICENSE README.md README.md View all files Repository files navigation
AI safety gridworlds
This is a suite of reinforcement learning environments illustrating various
safety properties of intelligent agents. These environments are
implemented in pycolab , a
highly-customisable gridworld game engine with some batteries included.
For more information, see the accompanying research
paper .
For the latest list of changes, see CHANGES.md .
Instructions
Open a new terminal window ( iterm2 on Mac, gnome-terminal or xterm on
linux work best, avoid tmux / screen ).
Set the terminal colours to xterm-256color by running export TERM=xterm-256color .
Clone the repository using
git clone https://github.com/deepmind/ai-safety-gridworlds.git .
Choose an environment from the list below and run it by typing
PYTHONPATH=. python -B ai_safety_gridworlds/environments/ENVIRONMENT_NAME.py .
Dependencies
Python 2 (with enum34 support) or Python 3. We tested it with all the commonly used Python minor versions (2.7, 3.4, 3.5, 3.6). Note that the version 2.7.15 might have curses rendering issues in a terminal.
Pycolab which is the gridworlds game engine we use.
Numpy. Our version is 1.14.5. Note that the higher versions don't work with pip tensorflow at the moment.
Abseil Python common libraries.
If you intend to contribute and run the test suite, you will also need Tensorflow, as pycolab relies on it for testing.
We also recommend using a virtual environment. Under the assumption that you have the virtualen
... (truncated, 8 KB total)64f41b0780d481a9 | Stable ID: sid_2x0U5UgFEL