Withdraw
Loading…
Towards simulator enabled offline validation and learning of reinforcement learning agents
Katdare, Pulkit
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/125792
Description
- Title
- Towards simulator enabled offline validation and learning of reinforcement learning agents
- Author(s)
- Katdare, Pulkit
- Issue Date
- 2024-07-11
- Director of Research (if dissertation) or Advisor (if thesis)
- Driggs-Campbell, Katherine
- Doctoral Committee Chair(s)
- Driggs-Campbell, Katherine
- Committee Member(s)
- Varshney, Lav
- Schwing, Alexander
- Gupta, Saurabh
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Reinforcement Learning
- Off policy evaluation
- Offline Reinforcement Learning
- Abstract
- Reinforcement learning pertains to a class of algorithms that attempt to learn the right decisions through rigorous self-play. Over time, reinforcement learning has achieved great success across many commercial applications like designing chat-bots, learning to play complex games and robotics. Although reinforcement learning when applied to robotics has demonstrated a lot of promise, it still has a long way to go. This is because, reinforcement learning typically requires the robot to explore diverse actions efficiently to learn the right set of actions. Although exploration is possible in domains like chess, this approach is not really viable in robotics. This is because, robots exploring their actions might be unsafe and lead to robot failures and/or accident. Commercially this can be risky, potentially involving significant sums of money. A common compromise in many reinforcement learning applications is to learn and validate robotic agents extensively in a software based simulation environment of the robot. This will not only allow robot to explore environments, but possibly validate for future safety concerns. After sufficient testing across different possible scenarios in the simulation, the idea is to deploy them to the robot. Although most common, this particular method of implementing robot learning does-not translate well in practice. This is because of the reality gap between the simulator and the real world, which can sometimes lead to drastic changes in the performance from simulation to the real world. The reality gap, which is often called as the sim2real gap, refers to the inability of the simulator to mimic each and every aspect of the robot environment like friction. Another common method to perform reinforcement learning for robots to utilize fixed batch of expert data collected on the robot. Such a method, although promising, does not work well in practice, because the robot does-not generalize well to states not seen in the offline data. In this thesis, we aim to combine different aspects of the above mentioned methods to perform offline validation and learning for robot agents. We note that simulation, although imperfect does allow for exploration and generalization for robotic agents. At the same time, offline data collected from the real world is limited but demonstrates robot's interactions in the real world which is expensive. The key idea of our thesis is to succinctly combine offline data with the simulator to both validate and learn optimal policies for the real world in simulation. Notably we look at performative correction of the simulator to ensure an accurate reflection the robot in the real world. In performative correction, we push for designing techniques which can quantitatively measure the performance of the robot in the real world. What that means is that we utilize rollouts from the simulator as before and re-weight trajectories that resemble the real world performance with a higher weight than the ones which are far off from the real world. To that end, we propose two methods to validate robot's performance in the real world. In chapter 2, we propose a technique that estimates the sim2real gap between the simulator and the real world and corrects for the gap by shaping the reward function. In chapter 3, we refine our correction by looking at the dual formulation of a reinforcement learning problem to provide a min-max optimization which is much harder but more accurate at correcting for simulator performance. In chapter 4, we finally take the first steps in building a better learning algorithm that is able to utlize these validation techniques to perform offline reinforcement learning.
- Graduation Semester
- 2024-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/125792
- Copyright and License Information
- Copyright 2024 Pulkit Katdare
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…