Statistical inference in high dimensional data and machine learning
Zhang, Yangfan
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/115556
Description
Title
Statistical inference in high dimensional data and machine learning
Author(s)
Zhang, Yangfan
Issue Date
2022-04-22
Director of Research (if dissertation) or Advisor (if thesis)
Shao, Xiaofeng
Yang, Yun
Doctoral Committee Chair(s)
Shao, Xiaofeng
Yang, Yun
Committee Member(s)
Chen, Xiaohui
Zhu, Ruoqing
Department of Study
Statistics
Discipline
Statistics
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Statistical Inference
High-dimensional Data
U-statistics
Stochastic Gradient Descent
Mean-field Variational Inference
Abstract
This thesis includes four projects. In the first project, we study the non-asymptotic theory of mean-field variational inference and show a BvM theorem for the variational distribution. We propose ELBO as a new criterion for model selection, and demonstrate that it is asymptotically equivalent to BIC but can have better accuracy in terms of evidence approximation.
Moreover, we show the geometric convergence of the CAVI algorithm under parametric model framework. In the second project, we propose a class of $L_q$-norm based test statistics, for change point detection in the mean of high-dimensional independent data. We show the asymptotic normality and independence between the statistics with different $q$'s, so that we may combine them to construct an adaptive test with high power against both sparse and dense alternatives. The idea of self-normalization is further applied to avoid variance estimation and leads to pivotal statistics. We also propose a consistent estimator for the change point location, and combine it with a wild binary segmentation algorithm to estimate the change-point number and locations.
In the third project, we also propose a class of $L_q$-norm based U-statistics for high-dimensional independent data, but are focused on global testing for model parameters. The statistics are applicable to many testing problems including testing of mean vector and its spatial sign, simultaneous testing of linear model coefficients, and testing of component-wise independence for high-dimensional observations, among others. A variant of the proposed U-statistic with monotone indexes is also considered, with which dynamic programming can be applied to alleviate the computation burden. In the fourth project, we propose an online method based on perturbed SGD to obtain the confidence interval of the true parameters efficiently. The method inherits the online nature of the SGD, and only requires two or four parallel runs of SGD-type algorithms to obtain the confidence interval in any fixed direction. We further combine our method with the UCB algorithm to deal with the bandit problem.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.