Files in this item



application/pdfZHU-DISSERTATION-2020.pdf (1MB)Restricted Access
(no description provided)PDF


Title:Statistical inference for high-dimensional data
Author(s):Zhu, Changbo
Director of Research:Shao, Xiaofeng
Doctoral Committee Chair(s):Shao, Xiaofeng
Doctoral Committee Member(s):Chen, Xiaohui; Fellouris, Geogious; Marden, John I.
Department / Program:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):High dimensionality
Independence test
Two Sample Test
Change Point Detection
Power Analysis
Abstract:Statistical inference is a procedure of using collected observations to deduce properties of the underlying data generating process. In this thesis, we investigate three important problems in high-dimensional statistics and develop some new methods and theory, which show the limitation of some existing approaches and motivate the use of our proposed methods. In the first chapter, we study distance covariance, Hilbert-Schmidt covariance (aka Hilbert-Schmidt independence criterion [Gretton et al. (2008)] and related independence tests under the high dimensional scenario. We show that the sample distance/Hilbert-Schmidt covariance between two random vectors can be approximated by the sum of squared componentwise sample cross-covariances up to an asymptotically constant factor, which indicates that the distance/Hilbert-Schmidt covariance based test can only capture linear dependence in high dimension. Under the assumption that the components within each high-dimensional vector are weakly dependent, the distance correlation based t test developed by Szekely and Rizzo (2013) for independence is shown to have trivial limiting power when the two random vectors are nonlinearly dependent but component-wisely uncorrelated. This new and surprising phenomenon, which seems to be discovered for the first time, is further confirmed in our simulation study. As a remedy, we propose tests based on an aggregation of marginal sample distance/Hilbert-Schmidt covariances and show their superior power behavior against their joint counterparts in simulations. We further extend the distance correlation based $t$ test to those based on Hilbert-Schmidt covariance and marginal distance/Hilbert-Schmidt covariance. A novel unified approach is developed to analyze the studentized sample distance/Hilbert-Schmidt covariance as well as the studentized sample marginal distance covariance under both null and alternative hypothesis. Our theoretical and simulation results shed light on the limitation of distance/Hilbert-Schmidt covariance when used jointly in the high dimensional setting and suggest the aggregation of marginal distance/Hilbert-Schmidt covariance as a useful alternative. In the second chapter, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low/medium sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensional distributions correspond to the same marginal distributions but differ in other aspects of the distributions. The tests based on energy distance and maximum mean discrepancy mainly target the differences between marginal means and variances, whereas the test based on L1-distance can capture the difference in marginal distributions. Our theory sheds new light on the limitation of inter-point distance based tests, the impact of different distance metrics, and the behavior of permutation tests in high dimension. Some simulation results and a real data illustration are also presented to corroborate our theoretical findings. In the third chapter, we propose a new methodology for change point detection of a high-dimensional time series. We extend the U-statistic based approach of Wang et al. (2019) by applying the trimming technique and utilizing the self-normalization principle. Under the fixed-b asymptotics, where we fix the proportion of trimming parameter over the sample size, we derive the limiting distributions of our test statistic under both the null and local alternatives of a single mean change. Furthermore, we combine our test statistic with the wild binary segmentation procedure to perform the change-point estimation. Empirical simulations demonstrate that the trimming technique is effective and necessary for both testing and estimation when there is strong temporal dependence. As an important theoretical contribution, we derive the weak convergence of the U-statistic based processes for high-dimensional linear process and show the applicability of BN decomposition in high dimension.
Issue Date:2020-07-15
Rights Information:Copyright 2020 Changbo Zhu
Date Available in IDEALS:2020-10-07
Date Deposited:2020-08

This item appears in the following Collection(s)

Item Statistics