Files in this item



application/pdfZAYTSEV-THESIS-2017.pdf (992kB)
(no description provided)PDF


Title:Faster apprenticeship learning through inverse optimal control
Author(s):Zaytsev, Andrey
Advisor(s):Peng, Jian
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Apprenticeship learning
Inverse reinforcement learning
Inverse optimal control
Deep learning
Reinforcement learning
Machine learning
Abstract:One of the fundamental problems of artificial intelligence is learning how to behave optimally. With applications ranging from self-driving cars to medical devices, this task is vital to modern society. There are two complementary problems in this area – reinforcement learning and inverse reinforcement learning. While reinforcement learning tries to find an optimal strategy in a given environment with known rewards for each action, inverse reinforcement learning or inverse optimal control seeks to recover rewards associated with actions given the environment and an optimal policy. Typically, apprenticeship learning is approached as a combination of these two techniques. This is an iterative process – at each step inverse reinforcement learning is applied first to get the rewards, followed by reinforcement learning to produce a guess for an optimal policy. Each guess is used in the further iterations to come up with a more accurate estimate of the reward function. While this works for problems with a small number of discreet states, the approach scales poorly. In order to mitigate those limitations, this research proposes a robust approach based on recent advances in the field of deep learning. Using the matrix formulation of inverse reinforcement learning, a reward function and an optimal policy can be recovered without having to iteratively optimize both. The approach scales well for problems with very large and continuous state spaces such as autonomous vehicle navigation. An evaluation performed using OpenAI RLLab suggests that this method is robust and ready to be adopted for solving problems both in research and various industries.
Issue Date:2017-12-05
Rights Information:Copyright 2017 Andrey Zaytsev
Date Available in IDEALS:2018-03-13
Date Deposited:2017-12

This item appears in the following Collection(s)

Item Statistics