Withdraw
Loading…
Adaptive learning from demonstration in heterogeneous agents: Concurrent minimization and maximization of surprise in sparse reward environments
Clark, Emma
Loading…
Permalink
https://hdl.handle.net/2142/122173
Description
- Title
- Adaptive learning from demonstration in heterogeneous agents: Concurrent minimization and maximization of surprise in sparse reward environments
- Author(s)
- Clark, Emma
- Issue Date
- 2023-12-05
- Director of Research (if dissertation) or Advisor (if thesis)
- Mehr, Negar
- Department of Study
- Aerospace Engineering
- Discipline
- Aerospace Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- reinforcement learning
- learning from demonstration, curriculum learning
- Abstract
- Learning from Demonstration (LfD) is a reinforcement learning method where an agent learns a policy by imitation demonstrations from an expert. This expert can be another agent, already trained to have an optimal policy or a predefined control system. Or, the expert can be a human. LfD is useful for learning very complex tasks or in settings with strict behavior guidelines or restrictions. One of the major limitations of LfD is an inability to learn when there are differences in dynamics between the student and teacher agents. This limits LfD methods to homogenous agents; however, real-world scenarios may often have differences in dynamics or environmental constraints between the student and teacher. Such as, a robot learning from human demonstration, or two different models of robot with variations in maximum joint angles or actuator power. Even analogous systems may have small variations in robot capabilities, due to noise or under-performance from technological limitations. To address this challenge, we propose a Student-Teacher framework, where the Teacher agent uses the Student’s surprise with, respect to demonstration trajectories, to infer differences in dynamics between itself and the Student. The teacher is then able to adapt its demonstration trajectories to consider the dynamics or constraints of the Student. In contrast to most common LfD methods, we assume the Teacher is not already an expert, but instead is learning in parallel to the Student.
- Graduation Semester
- 2023-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/122173
- Copyright and License Information
- Copyright 2023 Emma Clark
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…