Model-free RL for Constrained Markov Decision Process: Foundations and Applications

Speaker

Arnob Ghosh

Affiliation

Research Scientist
Department of Electrical and Computer Engineering
The Ohio State University

Abstract

Many constrained sequential decision-making processes such as safe AV navigation, wireless network control, caching, cloud computing, etc., can be cast as Constrained Markov Decision Processes (CMDP). Reinforcement Learning (RL) algorithms have been used to learn optimal policies for unknown unconstrained MDP. Extending these RL algorithms to unknown CMDP, brings the additional challenge of not only maximizing the reward, but also satisfying the constraints. In contrast to existing model-based approaches or model-free methods accompanied with a ‘simulator’, we aim to develop the first model-free, simulator-free RL algorithm that achieves a sublinear regret and a sublinear constraint violation with respect to the number of episodes. To this end, we consider the episodic linear CMDP where the transition probabilities and rewards are linear in the feature space. In our NeurIPS paper, We show that our approach can achieve O ̃(√K) regret and O ̃(√K) constraint violation where K is the no. of episodes. Our result does not depend on the dimension of the state-space, and hence is applicable to the large state space. This is the first result which shows that O ̃(√K) regret and O ̃(√K) constraint violation is achievable using model-free RL even for small state space (a.k.a tabular case). We also show that we can in fact achieve zero constraint violation for large K. We propose a primal-dual adaptation of the LSVI-UCB algorithm and demonstrate how we overcome analytical challenges for infinite state-space using soft-max policy instead of greedy policy. In our recent ICLR paper, we also show the first (model-free or model-based) results for constrained MDP in infinite-horizon setting for large state space.

Bio

alt text 

Arnob Ghosh is currently a Research Scientist at the Dept. of Electrical and Computer Engg. At the Ohio State University. He is also associated with the NSF AI-EDGE Institute. His host is Ness Shroff since June 2021. Arnob Ghosh obtained his Ph.D. degree from the University of Pennsylvania, USA in Electrical and Systems Engg. in August 2016. Prior to joining the OSU, he was an Assistant Professor at the Industrial Engg. Department (part of Mechanical Engg.) at the IIT-Delhi since August 2019. He was also associated with the School of Artificial Intelligence at IIT-Delhi. He has also worked with Thomas Parisini of Imperial College of London as a visiting research associate. Arnob Ghosh was also a post-doc research Associate at the Purdue University from August 2016- July 2019. He also spent the summer of 2017 and the summer of 2014 at the Northwestern University and Inria respectively.