## Files in this item

FilesDescriptionFormat

application/pdf

JIANG-DISSERTATION-2015.pdf (2MB)
(no description provided)PDF

## Description

 Title: On the resolution of misspecification in stochastic optimization, variational inequality, and game-theoretic problems Author(s): Jiang, Hao Director of Research: Shanbhag, Uday Doctoral Committee Chair(s): Nedich, Angelia Doctoral Committee Member(s): Meyn, Sean; Beck, Carolyn Department / Program: Industrial & Enterprise Systems Engineering Discipline: Industrial Engineering Degree Granting Institution: University of Illinois at Urbana-Champaign Degree: Ph.D. Genre: Dissertation Subject(s): Misspecified Stochastic Optimization Misspecified Variational Inequality Misspecified Stochastic Nash Games Misspecified Markov Decision Processes Abstract: Traditionally, much of the research in the field of optimization algorithms has assumed that problem parameters are correctly specified. Recent efforts under the robust optimization framework have relaxed this assumption by allowing unknown parameters to vary in a prescribed uncertainty set and by subsequently solving for a worst-case solution. This dissertation considers a rather different approach in which the unknown or misspecified parameter is a solution to a suitably defined (stochastic) learning problem based on having access to a set of samples. Practical approaches in resolving such a set of coupled problems have been either sequential or direct variational approaches. In the case of the former, this entails the following steps: (i) a solution to the learning problem for parameters is first obtained; and (ii) a solution is obtained to the associated parametrized computational problem by using (i). Such avenues prove difficult to adopt particularly since the learning process has to be terminated finitely and consequently, in large-scale or stochastic instances, sequential approaches may often be corrupted by error. On the other hand, a variational approach requires that the problem may be recast as a possibly non-monotone stochastic variational inequality problem; but there are no known first-order (stochastic) schemes currently available for the solution of such problems. Motivated by these challenges, this thesis focuses on studying joint schemes of optimization and learning in three settings: (i) misspecified stochastic optimization and variational inequality problems, (ii) misspecified stochastic Nash games, (iii) misspecified Markov decision processes. In the first part of this thesis, we present a coupled stochastic approximation scheme which simultaneously solves both the optimization and the learning problems. The obtained schemes are shown to be equipped with almost sure convergence properties in regimes when the function $f$ is either strongly convex as well as merely convex. Importantly, the scheme displays the optimal rate for strongly convex problems while in merely convex regimes, through an averaging approach, we quantify the degradation associated with learning by noting that the error in function value after $K$ steps is $O(\sqrt{\ln(K)/K})$, rather than $O(\sqrt{1/K})$ when $\theta^*$ is available. Notably, when the averaging window is modified suitably, it can be see that the original rate of $O(\sqrt{1/K})$ is recovered. Additionally, we consider an online counterpart of the misspecified optimization problem and provide a non-asymptotic bound on the average regret with respect to an offline counterpart. We also extend these statements to a class of stochastic variational inequality problems, an object that unifies stochastic convex optimization problems and a range of stochastic equilibrium problems. Analogous almost-sure convergence statements are provided in strongly monotone and merely monotone regimes, the latter facilitated by using an iterative Tikhonov regularization. In the merely monotone regime, under a weak-sharpness requirement, we quantify the degradation associated with learning and show that expected error associated with $dist(x_k,X^*)$ is $O(\sqrt{\ln(K)/K})$. In the second part of this thesis, we present schemes for computing equilibria to two classes of convex stochastic Nash games complicated by a parametric misspecification, a natural concern in the control of large- scale networked engineered system. In both schemes, players learn the equilibrium strategy while resolving the misspecification: (1) Stochastic Nash games: We present a set of coupled stochastic approximation distributed schemes distributed across agents in which the first scheme updates each agent’s strategy via a projected (stochastic) gradient step while the second scheme updates every agent’s belief regarding its misspecified parameter using an independently specified learning problem. We proceed to show that the produced sequences converge to the true equilibrium strategy and the true parameter in an almost sure sense. Surprisingly, convergence in the equilibrium strategy achieves the optimal rate of convergence in a mean-squared sense with a quantifiable degradation in the rate constant; (2) Stochastic Nash-Cournot games with unobservable aggregate output: We refine (1) to a Cournot setting where we assume that the tuple of strategies is unobservable while payoff functions and strategy sets are public knowledge through a common knowledge assumption. By utilizing observations of noise-corrupted prices, iterative fixed-point schemes are developed, allowing for simultaneously learning the equilibrium strategies and the misspecified parameter in an almost-sure sense. In the third part of this thesis, we consider the solution of a finite-state infinite horizon Markov Decision Process (MDP) in which both the transition matrix and the cost function are misspecified, the latter in a parametric sense. We consider a data-driven regime in which the learning problem is a stochastic convex optimization problem that resolves misspecification. Via such a framework, we make the following contributions: (1) We first show that a misspecified value iteration scheme converges almost surely to its true counterpart and the mean-squared error after $K$ iterations is $O(\sqrt{1/K})$; (2) An analogous asymptotic almost-sure convergence statement is provided for misspecified policy iteration; and (3) Finally, we present a constant steplength misspecified Q-learning scheme and show that a suitable error metric is $O(\sqrt{1/K})$ + $O(\sqrt{δ})$ after K iterations where δ is a bound on the steplength. Issue Date: 2015-12-02 Type: Thesis URI: http://hdl.handle.net/2142/89033 Rights Information: Copyright 2015 Hao Jiang Date Available in IDEALS: 2016-03-02 Date Deposited: 2015-12
﻿