Towards simulator enabled offline validation and learning of reinforcement learning agents

Katdare, Pulkit

Towards simulator enabled offline validation and learning of reinforcement learning agents

Katdare, Pulkit

This item's files can only be accessed by the System Administrators group.

Permalink

https://hdl.handle.net/2142/125792

Description

Title

Towards simulator enabled offline validation and learning of reinforcement learning agents

Author(s)

Katdare, Pulkit

Issue Date

2024-07-11

Director of Research (if dissertation) or Advisor (if thesis)

Driggs-Campbell, Katherine

Doctoral Committee Chair(s)

Driggs-Campbell, Katherine

Committee Member(s)

Varshney, Lav
Schwing, Alexander
Gupta, Saurabh

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Reinforcement Learning
Off Policy Evaluation
Offline Reinforcement Learning

Language

eng

Abstract

Reinforcement learning pertains to a class of algorithms that attempt to learn the right decisions through rigorous self-play. Over time, reinforcement learning has achieved great success across many commercial applications like designing chat-bots, learning to play complex games and robotics. Although reinforcement learning when applied to robotics has demonstrated a lot of promise, it still has a long way to go. This is because, reinforcement learning typically requires the robot to explore diverse actions efficiently to learn the right set of actions. Although exploration is possible in domains like chess, this approach is not really viable in robotics. This is because, robots exploring their actions might be unsafe and lead to robot failures and/or accident. Commercially this can be risky, potentially involving significant sums of money. A common compromise in many reinforcement learning applications is to learn and validate robotic agents extensively in a software based simulation environment of the robot. This will not only allow robot to explore environments, but possibly validate for future safety concerns. After sufficient testing across different possible scenarios in the simulation, the idea is to deploy them to the robot. Although most common, this particular method of implementing robot learning does-not translate well in practice. This is because of the reality gap between the simulator and the real world, which can sometimes lead to drastic changes in the performance from simulation to the real world. The reality gap, which is often called as the sim2real gap, refers to the inability of the simulator to mimic each and every aspect of the robot environment like friction. Another common method to perform reinforcement learning for robots to utilize fixed batch of expert data collected on the robot. Such a method, although promising, does not work well in practice, because the robot does-not generalize well to states not seen in the offline data. In this thesis, we aim to combine different aspects of the above mentioned methods to perform offline validation and learning for robot agents. We note that simulation, although imperfect does allow for exploration and generalization for robotic agents. At the same time, offline data collected from the real world is limited but demonstrates robot's interactions in the real world which is expensive. The key idea of our thesis is to succinctly combine offline data with the simulator to both validate and learn optimal policies for the real world in simulation. Notably we look at performative correction of the simulator to ensure an accurate reflection the robot in the real world. In performative correction, we push for designing techniques which can quantitatively measure the performance of the robot in the real world. What that means is that we utilize rollouts from the simulator as before and re-weight trajectories that resemble the real world performance with a higher weight than the ones which are far off from the real world. To that end, we propose two methods to validate robot's performance in the real world. In chapter 2, we propose a technique that estimates the sim2real gap between the simulator and the real world and corrects for the gap by shaping the reward function. In chapter 3, we refine our correction by looking at the dual formulation of a reinforcement learning problem to provide a min-max optimization which is much harder but more accurate at correcting for simulator performance. In chapter 4, we finally take the first steps in building a better learning algorithm that is able to utlize these validation techniques to perform offline reinforcement learning.

Graduation Semester

2024-08

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/125792

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Towards simulator enabled offline validation and learning of reinforcement learning agents

Katdare, Pulkit

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In