Files in this item



application/pdfWU-DISSERTATION-2021.pdf (2MB)Restricted to U of Illinois
(no description provided)PDF


Title:Change point detection for high dimensional data and valid inference for Bayesian linear models
Author(s):Wu, Teng
Director of Research:Shao, Xiaofeng; Narisetty, Naveen Naidu
Doctoral Committee Chair(s):Shao, Xiaofeng; Narisetty, Naveen Naidu
Doctoral Committee Member(s):Li, Bo; Yang, Yun
Department / Program:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Change point detection
Sequential monitoring
Quantile regression
Bayesian modeling
High dimensional statistics
Linear model
Abstract:We propose statistical methodologies for high dimensional change point detection and inference for Bayesian linear models. In the first project, we propose a change point detection method testing mean shift for high dimensional observations with unknown heteroscedasticity. The proposed tests target a dense alternative and a wild bootstrap procedure is used to estimate the unknown limiting distribution. The bootstrap test is free of tuning parameters and we derive bootstrap consistency under the null. We extend the theory results to testing multiple change points and provide the justification for the size and power. For estimation of unknown change point locations, we utilize the wild binary segmentation algorithm. Empirical studies show that our methods have the correct size and better power compared with the existing approach when heteroscedasticity exists. In the second project, we propose a class of monitoring statistics for a mean shift in a sequence of high-dimensional observations. Inspired by the recent U-statistic based retrospective tests, we advance the U-statistic based approach to the sequential monitoring problem by developing a new adaptive monitoring procedure that can detect both dense and sparse changes in real-time. Unlike existing work based on self-normalization, we introduce a class of estimators for $q$-norm of the covariance matrix and prove their ratio consistency. To facilitate fast computation, we further develop recursive algorithms to improve the computational efficiency of the monitoring procedure. The advantage of the proposed methodology is demonstrated via simulation studies and real data illustrations. In the third project, we propose the use of a score-based working likelihood function for quantile regression which can perform inference for multiple conditional quantiles of an arbitrary number. We show that the proposed likelihood can be used in a Bayesian framework leading to valid frequentist inference, whereas the commonly used asymmetric Laplace working likelihood leads to invalid interval estimations and requires further correction. For computation, we propose a novel adaptive importance sampling algorithm to compute important posterior summaries such as the posterior mean and the covariance matrix. Our proposed approach makes it feasible to perform valid inference for parameters such as the slope differences at different quantile levels, which is either not possible or cumbersome using existing Bayesian approaches. Empirical results demonstrate that the proposed likelihood has good estimation and inferential properties and that the proposed computational algorithm is more efficient than its competitors. In the fourth project, we propose a new Bayesian method to perform valid inference for low dimensional parameters in high dimensional linear models under sparsity constraints. The idea is to use quasi Bayesian posteriors based on partial regression models to remove the effect of high dimensional nuisance variables and generate posterior samples of parameters for valid uncertainty quantification. We name the final distribution we use to conduct inference ``conditional Bayesian posterior'' as it is constructed conditional on quasi posterior distributions of other parameters and does not admit a fully Bayesian interpretation. Unlike existing Bayesian regularization methods, our method can be used to quantify the estimation uncertainty for arbitrarily small signals and therefore does not require variable selection consistency to guarantee its validity. Theoretically, we show that the resulting Bayesian credible intervals achieve desired coverage probabilities in the frequentist sense. Methodologically, our proposed Bayesian framework can easily incorporate popular Bayesian regularization procedures such as those based on spike and slab priors and horseshoe priors to facilitate high accuracy estimation and inference. Numerically, our proposed method is demonstrated to have competitive empirical performance based on extensive simulation studies and a real data analysis.
Issue Date:2021-04-23
Rights Information:Copyright 2021 Teng Wu
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05

This item appears in the following Collection(s)

Item Statistics