MACPO (Multi-Agent Constrained Policy Optimization)
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
A comprehensive survey of Safe Reinforcement Learning (SafeRL) and Multi-Agent Safe RL, covering Constrained Markov Decision Processes and theoretical foundations essential for developing provably safe AI systems.
Paper Details
Metadata
Abstract
Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges, and related prior work. This survey is intended as a technical guide for researchers interested in SafeRL and SafeMARL, highlighting key concepts, methods, and open future research directions.
Summary
This survey provides a comprehensive technical overview of Safe Reinforcement Learning (SafeRL), focusing on Constrained Markov Decision Processes (CMDPs) and their extensions to multi-agent settings (SafeMARL). The paper reviews theoretical foundations of CMDPs, state-of-the-art algorithms for single-agent SafeRL including policy gradient methods with safety guarantees and safe exploration strategies, and recent advances in SafeMARL for both cooperative and competitive scenarios. The authors identify five open research problems to guide future work, with particular emphasis on advancing SafeMARL, making this a technical reference for researchers developing safe learning algorithms.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Multi-Agent Safety | Approach | 68.0 |
Cached Content Preview
A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety
A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety
Ankita Kushwaha
International Institute of Information Technology, Hyderabad
Kiran Ravish
International Institute of Information Technology, Hyderabad
Preeti Lamba
International Institute of Information Technology, Hyderabad
Pawan Kumar
International Institute of Information Technology, Hyderabad
Abstract
Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges, and related prior work. This survey is intended as a technical guide for researchers interested in SafeRL and SafeMARL, highlighting key concepts, methods, and open future research directions.
1 Introduction
Reinforcement learning (RL) has achieved remarkable success in domains such as games, robotics, and autonomous systems. However, when deploying RL in real-world safety-critical applications (e.g., autonomous driving, healthcare, robotics), it is essential to ensure that the learning agent avoids catastrophic failures or unsafe behaviors Amodei et al. ( 2016 ); Garcia and Fernandez ( 2015 ) . Safe Reinforcement Learning (SafeRL) addresses this need by augmenting standard RL objectives with safety considerations, typically in the form of constraints on the agent’s behavior or environment outcomes.
Definition 1.1 .
The goal in SafeRL is to maximize performance (cumulative reward) while satisfying safety constraints during training and deployment.
A common framework for SafeRL is the Constrained Markov Decision Process (CMDP) introduced by Altman ( 1999 ) . In a CMDP, an agent seeks to maximize expected return subject to one or more constraints (e.g., bounds on certain costs or probabilities of unsafe events). This framework allows formalizing safety requirements as mathematical constraints and provides tools from constrained optimization and control theory to enforce them. SafeRL algorithms often leverage CMDP theory to find
... (truncated, 98 KB total)7ba5b02ca89ba9eb | Stable ID: sid_GE6rD58DKY