Topics in offline statistical reinforcement learning: addressing challenges in continuous actions, distribution shifts, and unmeasured confounding
Li, Yuhan
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/130009
Description
Title
Topics in offline statistical reinforcement learning: addressing challenges in continuous actions, distribution shifts, and unmeasured confounding
Author(s)
Li, Yuhan
Issue Date
2025-06-19
Director of Research (if dissertation) or Advisor (if thesis)
Zhu, Ruoqing
Doctoral Committee Chair(s)
Zhu, Ruoqing
Committee Member(s)
Shao, Xiaofeng
Zhao, Sihai Dave
Park, Chan
Department of Study
Statistics
Discipline
Statistics
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Reinforcement Learning
Personalized Medicine
Causal Inference
Markov Decision Process
Policy Learning
Policy Evaluation
Abstract
Reinforcement learning (RL) provides a principled framework for tackling sequential decision-making problems when system dynamics and outcomes are uncertain. While RL has made significant progress in recent decades, deploying these methods in real-world scenarios remain challenging. A major obstacle is that standard RL algorithms typically focuses on the online setting, where an agent continuously interacts with the environment to collect data, updating its policy in real time and learning by trial and error. In many practical domains, however, data collection is costly, and unconstrained exploration can raise serious safety and ethical concerns, especially in safety-critical areas such as personalized medicine and autonomous driving. Consequently, there is growing interest in offline RL, where the goal is to evaluate and optimize policies using only a fixed, precollected dataset, without any further interaction with the environment. In this thesis, we aim at tackling several major challenges in offline reinforcement learning. In the first part of the thesis, we focus on policy learning with continuous action space and introduce a novel quasi-optimal Bellman operator, which is able to identify near-optimal action regions. The proposed quasi-optimal Bellman operator addressed the shortcomings of existing approaches relying on modeling an optimal policy with infinite support distributions and is highly desirable in safety-critical scenarios. For the second part of this thesis, we study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes, where the objective is to establish a confidence interval (CI) for the target policy value using only pre-collected data generated from unknown behavior policies. The proposed unified error-quantification framework handles distributional shift and balances the trade-off between bias and uncertainty to produce tight confidence intervals. The thrid part of the thesis considers offline policy learning with the existence of unmeausured confounder. We extend the proximal causal inference framework to infinite horizon and develop a novel identification results that enable nonparamatric estimation of policy value. Leveraging this identification result, we further develop a policy-gradient-type algorithm for offline policy learning despite hidden confounders.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.