Files in this item

FilesDescriptionFormat

application/pdf

application/pdfWANG-DISSERTATION-2020.pdf (1MB)
(no description provided)PDF

Description

Title:Statistical inference for high-dimensional data via U-statistcs
Author(s):Wang, Runmin
Director of Research:Shao, Xiaofeng
Doctoral Committee Chair(s):Shao, Xiaofeng
Doctoral Committee Member(s):Chen, Xiaohui; Fellouris, Georgios; Simpson, Douglas G
Department / Program:Statistics
Discipline:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):High-dimensional data
U-statistics
Abstract:Owing to the advances in the science and technology, there is a surge of interest in high-dimensional data. Many methods developed in low or fixed dimensional setting may not be theoretically valid under this new setting, and sometimes are not even applicable when the dimensionality is larger than the sample size. To circumvent the difficulties brought by the high-dimensionality, we consider to use U-statistics based methods. In this thesis, we investigate the theoretical properties of U-statistics under the high-dimensional setting, and develop the novel U-statistics based methods to three problems. In the first chapter, we propose a new formulation of self-normalization for inference about the mean of high-dimensional stationary processes by using a U-statistic based approach. Self-normalization has attracted considerable attention in the recent literature of time series analysis, but its scope of applicability has been limited to low-/fixed-dimensional parameters for low-dimensional time series. Our original test statistic is a U-statistic with a trimming parameter to remove the bias caused by weak dependence. Under the framework of nonlinear causal processes, we show the asymptotic normality of our U-statistic with the convergence rate dependent upon the order of the Frobenius norm of the long-run covariance matrix. The self-normalized test statistic is then constructed on the basis of recursive subsampled U-statistics and its limiting null distribution is shown to be a functional of time-changed Brownian motion, which differs from the pivotal limit used in the low-dimensional setting. An interesting phenomenon associated with self-normalization is that it works in the high-dimensional context even if the convergence rate of original test statistic is unknown. We also present applications to testing for bandedness of the covariance matrix and testing for white noise for high-dimensional stationary time series and compare the finite sample performance with existing methods in simulation studies. At the root of our theoretical arguments, we extend the martingale approximation to the high-dimensional setting, which could be of independent theoretical interest. In the second chapter, we consider change point testing and estimation for high dimensional data. In the case of testing for a mean shift, we propose a new test which is based on U-statistics and utilizes the self-normalization principle. Our test targets dense alternatives in the high dimensional setting and involves no tuning parameters. The weak convergence of a sequential U-statistic based process is shown as an important theoretical contribution. Extensions to testing for multiple unknown change points in the mean, and testing for changes in the covariance matrix are also presented with rigorous asymptotic theory and encouraging simulation results. Additionally, we illustrate how our approach can be used in combination with wild binary segmentation to estimate the number and location of multiple unknown change points. In the third chapter, we consider the estimation and inference for the location of single change point in the mean of independent high-dimensional data. Our change point location estimator maximizes a new U-statistic based objective function, and its convergence rate and asymptotic distribution after suitable centering and normalization are obtained under mild assumptions. Our estimator turns out to have better efficiency as compared to the least squares based counterpart in the literature. Based on the asymptotic theory, we construct a confidence interval by plugging in consistent estimates of several quantities in the normalization. We also provide a bootstrap-based confidence interval and state its asymptotic validity under suitable conditions. Through simulation studies, we demonstrate favorable finite sample performance of the new change point location estimator as compared to its least squares based counterpart, and our bootstrap-based confidence intervals, as compared to several existing competitors. The asymptotic theory based on high-dimensional U-statistic is substantially different from those developed in the literature and is of independent interest.
Issue Date:2020-07-14
Type:Thesis
URI:http://hdl.handle.net/2142/108476
Rights Information:Copyright 2020 Runmin Wang
Date Available in IDEALS:2020-10-07
Date Deposited:2020-08


This item appears in the following Collection(s)

Item Statistics