Files in this item

FilesDescriptionFormat

application/pdf

application/pdfJOSHI-DISSERTATION-2020.pdf (15MB)
(no description provided)PDF

Description

Title:Adapt-to-learn policy transfer in reinforcement learning and deep model reference adaptive control
Author(s):Joshi, Girish
Director of Research:Chowdhary, Girish
Doctoral Committee Chair(s):Langbort, Cedric
Doctoral Committee Member(s):Srikant, Rayadurgam; Hovakimyan, Naira; West, Matthew
Department / Program:Aerospace Engineering
Discipline:Aerospace Engineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Reinforcement Learning, Policy Transfer in Reinforcement Learning, Adapt to Learn, Adaptive Control, MRAC, Deep Model Reference Adaptive Control.
Abstract:Adaptation and Learning from exploration have been a key in biological learning; Humans and animals do not learn every task in isolation; rather are able to quickly adapt the learned behaviors between similar tasks and learn new skills when presented with new situations. Inspired by this, adaptation has been an important direction of research in control as Adaptive Controllers. However, the Adaptive Controllers like Model Reference Adaptive Controller are mainly model-based controllers and do not rely on exploration instead make informed decisions exploiting the model's structure. Therefore such controllers are characterized by high sample efficiency and stability conditions and, therefore, suitable for safety-critical systems. On the other hand, we have Learning-based optimal control algorithms like Reinforcement Learning. Reinforcement learning is a trial and error method, where an agent explores the environment by taking random action and maximizing the likelihood of those particular actions that result in a higher return. However, these exploration techniques are expected to fail many times before exploring optimal policy. Therefore, they are highly sample-expensive and lack stability guarantees and hence not suitable for safety-critical systems. This thesis presents control algorithms for robotics where the best of both worlds that is ``Adaptation'' and ``Learning from exploration'' are brought together to propose new algorithms that can perform better than their conventional counterparts. In this effort, we first present an Adapt to learn policy transfer Algorithm, where we use control theoretical ideas of adaptation to transfer policy between two related but different tasks using the policy gradient method of reinforcement learning. Efficient and robust policy transfer remains a key challenge in reinforcement learning. Policy transfer through warm initialization, imitation, or interacting over a large set of agents with randomized instances, have been commonly applied to solve a variety of Reinforcement Learning (RL) tasks. However, this is far from how behavior transfer happens in the biological world: Here, we seek to answer the question: Will learning to combine adaptation reward with environmental reward lead to a more efficient transfer of policies between domains? We introduce a principled mechanism that can ``Adapt-to-Learn", which is adapt the source policy to learn to solve a target task with significant transition differences and uncertainties. Through theory and experiments, we show that our method leads to a significantly reduced sample complexity of transferring the policies between the tasks. In the second part of this thesis, information-enabled learning-based adaptive controllers like ``Gaussian Process adaptive controller using Model Reference Generative Network'' (GP-MRGeN), ``Deep Model Reference Adaptive Controller'' (DMRAC) are presented. Model reference adaptive control (MRAC) is a widely studied adaptive control methodology that aims to ensure that a nonlinear plant with significant model uncertainty behaves like a chosen reference model. MRAC methods try to adapt the system to changes by representing the system uncertainties as weighted combinations of known nonlinear functions and using weight update law that ensures that network weights are moved in the direction of minimizing the instantaneous tracking error. However, most MRAC adaptive controllers use a shallow network and only the instantaneous data for adaptation, restricting their representation capability and limiting their performance under fast-changing uncertainties and faults in the system. In this thesis, we propose a Gaussian process based adaptive controller called GP-MRGeN. We present a new approach to the online supervised training of GP models using a new architecture termed as Model Reference Generative Network (MRGeN). Our architecture is very loosely inspired by the recent success of generative neural network models. Nevertheless, our contributions ensure that the inclusion of such a model in closed-loop control does not affect the stability properties. The GP-MRGeN controller, through using a generative network, is capable of achieving higher adaptation rates without losing robustness properties of the controller, hence suitable for mitigating faults in fast-evolving systems. Further, in this thesis, we present a new neuroadaptive architecture: Deep Neural Network-based Model Reference Adaptive Control. This architecture utilizes deep neural network representations for modeling significant nonlinearities while marrying it with the boundedness guarantees that characterize MRAC based controllers. We demonstrate through simulations and analysis that DMRAC can subsume previously studied learning-based MRAC methods, such as concurrent learning and GP-MRAC. This makes DMRAC a highly powerful architecture for high-performance control of nonlinear systems with long-term learning properties. Theoretical proofs of the controller generalizing capability over unseen data points and boundedness properties of the tracking error are also presented. Experiments with the quadrotor vehicle demonstrate the controller performance in achieving reference model tracking in the presence of significant matched uncertainties. A software+communication architecture is designed to ensure online real-time inference of the deep network on a high-bandwidth computation-limited platform to achieve these results. These results demonstrate the efficacy of deep networks for high bandwidth closed-loop attitude control of unstable and nonlinear robots operating in adverse situations. We expect that this work will benefit other closed-loop deep-learning control architectures for robotics.
Issue Date:2020-07-13
Type:Thesis
URI:http://hdl.handle.net/2142/108463
Rights Information:Copyright 2020 Girish Joshi
Date Available in IDEALS:2020-10-07
Date Deposited:2020-08


This item appears in the following Collection(s)

Item Statistics