Withdraw
Loading…
Policy-based average-reward and robust Markov decision processes and reinforcement learning
Murthy, Yashaswini
Loading…
Permalink
https://hdl.handle.net/2142/129852
Description
- Title
- Policy-based average-reward and robust Markov decision processes and reinforcement learning
- Author(s)
- Murthy, Yashaswini
- Issue Date
- 2025-07-11
- Director of Research (if dissertation) or Advisor (if thesis)
- Srikant, Rayadurgam
- Doctoral Committee Chair(s)
- Srikant, Rayadurgam
- Committee Member(s)
- Hajek, Bruce
- Stolyar, Aleksandr
- Hu, Bin
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Reinforcement Learning
- Markov Decision Processes
- Abstract
- This thesis addresses critical challenges in applying Reinforcement Learning (RL) to complex, real-world control problems by developing and analyzing policy-based algorithms for Markov Decision Processes (MDPs) under average-reward, robust (risk-sensitive), and countable-space settings. Traditional RL methods often fall short due to assumptions of finite state/action spaces, bounded costs, or reliance on discounted reward criteria, which may not align with long-run performance objectives in dynamic systems. The research presented herein makes several key contributions. First, for average-reward MDPs, we establish rigorous finite-time performance bounds for Approximate Policy Iteration (API) and prove global convergence with $O(\log(T))$ regret for Projected Policy Gradient (PPG) methods, notably by proving the smoothness of the average-reward function. We also demonstrate global convergence for Natural Policy Gradient (NPG)/Mirror Descent Methods (MDM) in the tabular average-reward setting. Second, to tackle problems with countably infinite state spaces and unbounded costs, common in queuing systems, we develop an NPG algorithm with state-dependent step sizes derived from novel policy-independent bounds on the relative value function, achieving non-trivial $O(\sqrt{T})$ regret bounds and relaxing learning error assumptions. Third, for robust decision-making under model uncertainty and cost variability, we introduce Modified Policy Iteration (MPI) for risk-sensitive exponential cost average-cost MDPs, proving its finite-time convergence. This is extended to an Approximate MPI (AMPI) framework and an Approximate PI framework, providing the first convergence guarantees for approximate methods in this risk-sensitive, robust setting and deriving a unified stability condition for error propagation across risk-neutral and risk-sensitive regimes. Collectively, this work advances the theoretical understanding and practical applicability of policy-based RL, providing new algorithms, convergence guarantees, and analytical tools for optimizing long-run performance in large-scale, uncertain, and risk-aware environments.
- Graduation Semester
- 2025-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129852
- Copyright and License Information
- Copyright 2025 Yashaswini Murthy
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…