Files in this item



application/pdfWei_Chen.pdf (7MB)
(no description provided)PDF


Title:Value function approximation architectures for neuro-dynamic programming
Author(s):Chen, Wei
Director of Research:Meyn, Sean P.
Doctoral Committee Chair(s):Meyn, Sean P.
Doctoral Committee Member(s):Hajek, Bruce; Hutchinson, Seth A.; Nedich, Angelia
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Neuro-Dynamic Programming
Parametric Q-learning
Value Function Approximation
Processor Power Management
Data Center Power Management
Cross-Layer Wireless Control
Abstract:Neuro-dynamic programming is a class of powerful techniques for approximating the solution to dynamic programming equations. In their most computationally attractive formulations, these techniques provide the approximate solution only within a prescribed finite-dimensional function class. Thus, the question that always arises is how should the function class be chosen? In this dissertation, we first propose an approach using the solutions to associated fluid and diffusion approximations. In order to evaluate this approach, we establish bounds on the approximation errors. Next, we propose a novel parameterized Q-learning algorithm. Q-learning is a model-free method to compute the Q-function associated with an optimal policy, based on observations of states and actions. If the size of a state or a policy space is too large, Q-learning is often not very practical because there are too many Q-function values to update. One way to address this problem is to approximate the Q-function within a function class. However, such methods often require an explicit model of the system, such as the split sampling method introduced by Borkar. The proposed algorithm is a reinforcement learning (RL) method, in which case the system dynamics are not known. This method is designed based on using approximations of the transition kernel of the Markov decision process (MDP). Lastly, we apply the proposed results of value function approximation techniques to several applications. In the power management model, we focus on the processor speed control problem to balance the performance and energy usage. Then we extend the results to the load balancing and the power management problem of geographically distributed data centers with grid regulation. In the cross-layer wireless control problem, the network utility maximization (NUM) and adaptive modulation (AM) are combined to balance the network performance and transmission power. In these applications, we show how to model the real problems by using the MDP model with reasonable assumptions and necessary approximations. Approximations of the value function are obtained for specific models, and evaluated by getting bounds for the errors. These approximate solutions are then used to construct basis functions for learning algorithms in the simulations.
Issue Date:2014-01-16
Rights Information:Copyright 2013 Wei Chen
Date Available in IDEALS:2014-01-16
Date Deposited:2013-12

This item appears in the following Collection(s)

Item Statistics