Skip to content
Longterm Wiki
Back

MACPO (Multi-Agent Constrained Policy Optimization)

paper

Authors

Ankita Kushwaha·Kiran Ravish·Preeti Lamba·Pawan Kumar

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

A comprehensive survey of Safe Reinforcement Learning (SafeRL) and Multi-Agent Safe RL, covering Constrained Markov Decision Processes and theoretical foundations essential for developing provably safe AI systems.

Paper Details

Citations
0
0 influential
Year
2025
Methodology
survey

Metadata

arxiv preprintprimary source

Abstract

Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges, and related prior work. This survey is intended as a technical guide for researchers interested in SafeRL and SafeMARL, highlighting key concepts, methods, and open future research directions.

Summary

This survey provides a comprehensive technical overview of Safe Reinforcement Learning (SafeRL), focusing on Constrained Markov Decision Processes (CMDPs) and their extensions to multi-agent settings (SafeMARL). The paper reviews theoretical foundations of CMDPs, state-of-the-art algorithms for single-agent SafeRL including policy gradient methods with safety guarantees and safe exploration strategies, and recent advances in SafeMARL for both cooperative and competitive scenarios. The authors identify five open research problems to guide future work, with particular emphasis on advancing SafeMARL, making this a technical reference for researchers developing safe learning algorithms.

Cited by 1 page

PageTypeQuality
Multi-Agent SafetyApproach68.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202698 KB
A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety 
 
 
 
 
 
 

 
 

 
 
 
 
 A Survey of Safe Reinforcement Learning and Constrained MDPs:  A Technical Survey on Single-Agent and Multi-Agent Safety

 
 
 Ankita Kushwaha
 
 International Institute of Information Technology, Hyderabad
 
 
 Kiran Ravish
 
 International Institute of Information Technology, Hyderabad
 
 
 Preeti Lamba
 
 International Institute of Information Technology, Hyderabad
 
 
 Pawan Kumar
 
 International Institute of Information Technology, Hyderabad
 
 
 
 Abstract

 Safe Reinforcement Learning (SafeRL) is the subfield of reinforcement learning that explicitly deals with safety constraints during the learning and deployment of agents. This survey provides a mathematically rigorous overview of SafeRL formulations based on Constrained Markov Decision Processes (CMDPs) and extensions to Multi-Agent Safe RL (SafeMARL). We review theoretical foundations of CMDPs, covering definitions, constrained optimization techniques, and fundamental theorems. We then summarize state-of-the-art algorithms in SafeRL for single agents, including policy gradient methods with safety guarantees and safe exploration strategies, as well as recent advances in SafeMARL for cooperative and competitive settings. Additionally, we propose five open research problems to advance the field, with three focusing on SafeMARL. Each problem is described with motivation, key challenges, and related prior work. This survey is intended as a technical guide for researchers interested in SafeRL and SafeMARL, highlighting key concepts, methods, and open future research directions.

 

 
 
 1 Introduction

 
 Reinforcement learning (RL) has achieved remarkable success in domains such as games, robotics, and autonomous systems. However, when deploying RL in real-world safety-critical applications (e.g., autonomous driving, healthcare, robotics), it is essential to ensure that the learning agent avoids catastrophic failures or unsafe behaviors Amodei et al. ( 2016 ); Garcia and Fernandez ( 2015 ) . Safe Reinforcement Learning (SafeRL) addresses this need by augmenting standard RL objectives with safety considerations, typically in the form of constraints on the agent’s behavior or environment outcomes.

 
 
 
 Definition 1.1 . 

 
 The goal in SafeRL is to maximize performance (cumulative reward) while satisfying safety constraints during training and deployment. 

 
 
 
 A common framework for SafeRL is the Constrained Markov Decision Process (CMDP) introduced by Altman ( 1999 ) . In a CMDP, an agent seeks to maximize expected return subject to one or more constraints (e.g., bounds on certain costs or probabilities of unsafe events). This framework allows formalizing safety requirements as mathematical constraints and provides tools from constrained optimization and control theory to enforce them. SafeRL algorithms often leverage CMDP theory to find 

... (truncated, 98 KB total)
Resource ID: 7ba5b02ca89ba9eb | Stable ID: sid_GE6rD58DKY