[2205.10330] A Review of Safe Reinforcement Learning: Methods, Theory and Applications

paper

2022·arXiv·arxiv.org/abs/2205.10330

Authors

Shangding Gu·Long Yang·Yali Du·Guang Chen·Florian Walter·Jun Wang·Alois Knoll

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

A useful survey for researchers entering safe RL; covers the gap between theoretical safety guarantees and practical deployment, relevant to AI safety work focused on ensuring RL agents behave safely in high-stakes environments like robotics and autonomous vehicles.

Paper Details

Citations

314

5 influential

Year

2022

Methodology

survey

arXiv:2205.10330 DOI:10.48550/arXiv.2205.10330 Semantic Scholar

Metadata

Importance: 68/100arxiv preprintreference

Abstract

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-making tasks. However, safety concerns are raised during deploying RL in real-world applications, leading to a growing demand for safe RL algorithms, such as in autonomous driving and robotics scenarios. While safe control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future safe RL research, in this paper, we provide a review of safe RL from the perspectives of methods, theories, and applications. Firstly, we review the progress of safe RL from five dimensions and come up with five crucial problems for safe RL being deployed in real-world applications, coined as "2H3W". Secondly, we analyze the algorithm and theory progress from the perspectives of answering the "2H3W" problems. Particularly, the sample complexity of safe RL algorithms is reviewed and discussed, followed by an introduction to the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire future research on this thread. To advance the study of safe RL algorithms, we release an open-sourced repository containing the implementations of major safe RL algorithms at the link: https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git.

Summary

A comprehensive survey of safe reinforcement learning that organizes the field around five critical dimensions formalized as the '2H3W' framework (addressing How to define safety, How to ensure safety, and When/Where/Why safety matters). The paper reviews algorithmic progress, sample complexity theory, real-world applications in autonomous driving and robotics, and benchmarks, while releasing an open-source implementation repository of major safe RL algorithms.

Key Points

•Introduces the '2H3W' framework organizing five crucial problems for deploying safe RL in real-world settings, covering safety definitions, enforcement mechanisms, and application contexts.
•Reviews sample complexity of safe RL algorithms, providing theoretical grounding for understanding how much data is needed to learn safe policies.
•Surveys constrained Markov decision process (CMDP) formulations, Lyapunov-based methods, shielding, and other approaches to enforcing safety constraints during learning.
•Covers real-world application domains including autonomous driving and robotics, with a discussion of existing benchmarks for evaluating safe RL algorithms.
•Releases an open-source repository implementing major safe RL algorithms, facilitating reproducible research and community benchmarking.

Cited by 1 page

Page	Type	Quality
Deep Learning Revolution Era	Historical	44.0

Cached Content Preview

HTTP 200Fetched Apr 7, 202698 KB

[2205.10330] A Review of Safe Reinforcement Learning: Methods, Theory and Applications 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 A Review of Safe Reinforcement Learning: Methods, Theory and Applications

 
 
 Shangding Gu a∗ , Long Yang b , Yali Du c , Guang Chen d , Florian Walter a , Jun Wang e , Yaodong Yang b , Alois Knoll a 
 
 

 
 Abstract

 Reinforcement Learning (RL) has achieved tremendous success in many complex decision-making tasks. However, safety concerns are usually raised when it comes to deploying RL in the real world, leading to a growing demand for safe RL algorithms, such as in autonomous driving and robotics scenarios.
While safe control has a long history, the study of safe RL algorithms is still in the early stages.
To establish a good foundation for future research in this thread, in this paper, we provide a review of safe RL from the perspectives of methods, theory, and applications.
Firstly, we review the progress of safe RL from five dimensions and come up with five crucial problems for safe RL being deployed in real-world applications, coined as “2H3W” . Secondly, we analyze the theory and algorithm progress from the perspectives of answering the “2H3W” problems. Then, the sample complexity of safe RL methods is reviewed and discussed, followed by an introduction to the applications and benchmarks of safe RL algorithms.
Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire future research on this thread.

 To advance the study of safe RL algorithms, we release a benchmark suite, an open-sourced repository containing the implementations of major safe RL algorithms, along with tutorials at the link  1 1 1 https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines.git .

 
 
 keywords: 

safe reinforcement learning; safety optimisation; constrained Markov decision processes; safety problems

 
 
 \affiliation 
 organization=Department of Computer Science, Technical University of Munich,country=Germany

 
 
 \affiliation 
 organization=Institute
for AI, Peking University & BIGAI,country=China

 
 
 \affiliation 
 organization=Department of Informatics, King’s College London,country=UK

 
 
 \affiliation 
 organization=College of Electronic and Information Engineering, Tongji University,country=China

 
 
 \affiliation 
 organization=Department of Computer Science, University College London,country=UK

 
 
 {highlights} 
 
 
 A review of safe Reinforcement Learning (RL) methods is provided with theoretical analysis and application analysis.

 
 
 The key question that safe RL needs to answer is proposed, and five problems “2H3W” are analysed to address the key question.

 
 
 To examine the effectiveness of safe RL methods, several safe single-agent and multi-agent RL benchmarks are investigated.

 
 
 The challenging problems are pointed out to guide the research directions.

 
 
 
 1 Introduction

 
 Over the past decades, Reinforcement Learning (RL) has been widely adopted in 

... (truncated, 98 KB total)

Resource ID: 1efe2b3ae47b8e1b | Stable ID: sid_3ICieNmYH3