Files in this item



application/pdfHUANG-DISSERTATION-2017.pdf (610kB)Restricted to U of Illinois
(no description provided)PDF


Title:Fast algorithms for Bayesian variable selection
Author(s):Huang, Xichen
Director of Research:Liang, Feng
Doctoral Committee Chair(s):Liang, Feng
Doctoral Committee Member(s):Fellouris, Georgios; Qu, Annie; Shao, Xiaofeng
Department / Program:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Bayesian variable selection
variational Bayesian methods
online learning
Abstract:Variable selection of regression and classification models is an important but challenging problem. There are generally two approaches, one based on penalized likelihood, and the other based on Bayesian framework. We focus on the Bayesian framework in which a hierarchical prior is imposed on all unknown parameters including the unknown variable set. The Bayesian approach has many advantages, for example, we can access unknown obtain the posterior distribution of the sub-models. And more accurate prediction may be obtained by model averaging. However, as the posterior distribution of the model parameters is usually not in closed form, posterior inference that relies on Markov Chain Monte Carlo (MCMC) has high computational cost especially in high-dimensional settings, which makes Bayesian approaches undesirable. In order to deal with datasets with large number of features, we aim to develop fast algorithms for Bayesian variable selection, which approximate the true posterior distribution, but yet still return the right inference (at least asymptotically). In this thesis, we start with a variational algorithm for linear regression. Our algorithm is based on the work by Carbonetto and Stephens (2012), and with essential modifications including updating scheme and truncation of posterior inclusion probabilities. We have shown that our algorithm achieves both frequentist and Bayesian variable selection consistency. Then we extend our variational algorithm to logistic regression by incorporating the Polya-Gamma data-augmentation trick (Polson et al., 2013), which links our algorithm for linear regression with logistic regression. However, as the variational algorithm needs to update the variational distribution of all the latent Polya-Gamma random variables of the same size of the observations at every iteration, this algorithm is slow when there are huge amount of observations, or even be infeasible when the data is too large to be loaded into computer memory. We propose an online algorithm for the logistic regression, under the framework of online convex optimization. Our algorithm is fast, and achieves similar accuracy (log-loss) as the state-of-art algorithm (Follow-the-Regularized-Proximal algorithm).
Issue Date:2017-07-10
Rights Information:Copyright 2017 Xichen Huang
Date Available in IDEALS:2017-09-29
Date Deposited:2017-08

This item appears in the following Collection(s)

Item Statistics