Withdraw
Loading…
Maximum entropy on-policy reinforcement learning with monotonic policy improvement
Kapadia, Mustafa
Loading…
Permalink
https://hdl.handle.net/2142/121384
Description
- Title
- Maximum entropy on-policy reinforcement learning with monotonic policy improvement
- Author(s)
- Kapadia, Mustafa
- Issue Date
- 2023-07-21
- Director of Research (if dissertation) or Advisor (if thesis)
- Salapaka, Srinivasa M
- Department of Study
- Mechanical Sci & Engineering
- Discipline
- Mechanical Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Entropy Maximization
- Deep Reinforcement Learning
- Natural Policy Gradient Methods
- Combinatorial Optimization
- Abstract
- This thesis focuses on the utilization of the maximum entropy framework to train policies, which are renowned for their superior exploration and robustness, even in the presence of model and estimation errors. Our work encompasses the development of a theoretical foundation and a sample-based on-policy reinforcement learning algorithm based on the Maximum Entropy Principle (MEP). This algorithm ensures a consistent and monotonic improvement of policies across iterations, regardless of the initial policy. Furthermore, our theoretical advancements provide a framework for extending the solution of Paramterized Markov Decision Processes (ParaMDP) to address state and action spaces that were previously considered intractably large. We establish the necessary criteria for a well-posed maximum-entropy reinforcement learning problem in scenarios with an extensive number of states and actions, as well as infinite-horizon MDPs without a cost-free termination state. By incorporating the entropy over state action trajectories (or paths) into the objective function, we derive performance-estimation error bounds under MEP. This analysis involves drawing parallels and extending existing methods for on-policy reinforcement learning to cases where entropy maximization is added to the objective of the underlying optimization problem. We also introduce and analyze an ideal conservative policy iteration algorithm under MEP, and derive a practical sample-based algorithm that guarantees monotonic improvement. To evaluate the learning performance of our proposed algorithm, we conduct experiments on both continuous-control and discrete-control benchmark problems. We observe that resulting algorithms monotonic improvement with iterations and the training curve exhibits an O(1/T ) nature, where T are the number of iterations.
- Graduation Semester
- 2023-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/121384
- Copyright and License Information
- Copyright 2023 Mustafa Kapadia
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…