Files in this item



application/pdfCandido_Salvatore.pdf (1MB)
(no description provided)PDF


Title:Optimization for stochastic, partially observed systems using a sampling-based approach to learn switched policies
Author(s):Candido, Salvatore J.
Director of Research:Hutchinson, Seth A.
Doctoral Committee Chair(s):Hutchinson, Seth A.
Doctoral Committee Member(s):Meyn, Sean P.; Coleman, Todd P.; Zhou, Enlu
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Partially Observable Markov Decision Processes (POMDP)
machine learning
motion planning
stochastic control
Abstract:We propose a new method for learning policies for large, partially observable Markov decision processes (POMDPs) that require long time horizons for planning. Computing optimal policies for POMDPs is an intractable problem and, in practice, dimensionality renders exact solutions essentially unreachable for even small real-world systems of interest. For this reason, we restrict the policies we learn to the class of switched belief-feedback policies in a manner that allows us to introduce domain expert knowledge into the planning process. This approach has worked well for the systems on which we have tested our approach, and we conjecture that it will be useful for many real-world systems of interest. Our approach is based on a method like value iteration to learn a switching law. Because the POMDP problem is intractable, we use a Monte Carlo approximation to evaluate system behavior and optimize a switching law based on sampling. We explicitly analyze the sensitivity of expected cost (performance) with respect to perturbations introduced by our approximations, and use that analysis to avoid approximation errors that are potentially disastrous when using the computed policy. We demonstrate results on discrete POMDP problems from the literature and a resource allocation problem modeled after a team of robots attempting to extinguish a forest fire. We then utilize our approach to build two algorithms that solve the minimum uncertainty robot navigation problem. We demonstrate that our approach can improve on techniques in the literature in terms of solution quality by demonstrating our results in simulation. Our second approach utilizes information-theoretic heuristics to drive the sampling-based learning procedure. We conjecture that this is a useful direction towards an efficient, general stochastic motion planning algorithm.
Issue Date:2011-08-25
Rights Information:Copyright 2011 Salvatore J. Candido
Date Available in IDEALS:2011-08-25
Date Deposited:2011-08

This item appears in the following Collection(s)

Item Statistics