Withdraw
Loading…
Policy-based average-reward and robust Markov decision processes and reinforcement learning
Murthy, Yashaswini
Loading…
Permalink
https://hdl.handle.net/2142/129852
Description
- Title
- Policy-based average-reward and robust Markov decision processes and reinforcement learning
- Author(s)
- Murthy, Yashaswini
- Issue Date
- 2025-07-11
- Director of Research (if dissertation) or Advisor (if thesis)
- Srikant, Rayadurgam
- Doctoral Committee Chair(s)
- Srikant, Rayadurgam
- Committee Member(s)
- Hajek, Bruce
- Stolyar, Aleksandr
- Hu, Bin
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Reinforcement Learning
- Markov Decision Processes
- Language
- eng
- Abstract
- This thesis addresses critical challenges in applying Reinforcement Learning (RL) to complex, real-world control problems by developing and analyzing policy-based algorithms for Markov Decision Processes (MDPs) under average-reward, robust (risk-sensitive), and countable-space settings. Traditional RL methods often fall short due to assumptions of finite state/action spaces, bounded costs, or reliance on discounted reward criteria, which may not align with long-run performance objectives in dynamic systems. The research presented herein makes several key contributions. First, for average-reward MDPs, we establish rigorous finite-time performance bounds for Approximate Policy Iteration (API) and prove global convergence with $O(\log(T))$ regret for Projected Policy Gradient (PPG) methods, notably by proving the smoothness of the average-reward function. We also demonstrate global convergence for Natural Policy Gradient (NPG)/Mirror Descent Methods (MDM) in the tabular average-reward setting. Second, to tackle problems with countably infinite state spaces and unbounded costs, common in queuing systems, we develop an NPG algorithm with state-dependent step sizes derived from novel policy-independent bounds on the relative value function, achieving non-trivial $O(\sqrt{T})$ regret bounds and relaxing learning error assumptions. Third, for robust decision-making under model uncertainty and cost variability, we introduce Modified Policy Iteration (MPI) for risk-sensitive exponential cost average-cost MDPs, proving its finite-time convergence. This is extended to an Approximate MPI (AMPI) framework and an Approximate PI framework, providing the first convergence guarantees for approximate methods in this risk-sensitive, robust setting and deriving a unified stability condition for error propagation across risk-neutral and risk-sensitive regimes. Collectively, this work advances the theoretical understanding and practical applicability of policy-based RL, providing new algorithms, convergence guarantees, and analytical tools for optimizing long-run performance in large-scale, uncertain, and risk-aware environments.
- Graduation Semester
- 2025-08
- Type of Resource
- Text
- Handle URL
- https://hdl.handle.net/2142/129852
- Copyright and License Information
- Copyright 2025 Yashaswini Murthy
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…