Files in this item



application/pdfYU-DISSERTATION-2020.pdf (1MB)
(no description provided)PDF


Title:High-dimensional change point detection for mean and location parameters
Author(s):Yu, Mengjia
Director of Research:Chen, Xiaohui
Doctoral Committee Chair(s):Chen, Xiaohui
Doctoral Committee Member(s):Qu, Annie; Shao, Xiaofeng; Simpson, Douglas G
Department / Program:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):High-dimensional data
Change point analysis
Gaussian approximation
Binary segmentation
Abstract:Change point inference refers to detection of structural breaks of a sequence observation, which may have one or more distributional shifts subject to models such as mean or covariance changes. In this dissertation, we consider the offline multiple change point problem that the sample size is fixed in advance or after observation. In particular, we concentrate on high-dimensional setup where the dimension $p$ can be much larger than the sample size $n$ and traditional distribution assumptions can easily fail. The goal is to employ non-parametric approaches to identify change points without involving intermediate estimation to cross-sectional dependence. In the first part, we consider cumulative sum (CUSUM) statistics that are widely used in the change point inference and identification. We study two problems for high-dimensional mean vectors based on the $\ell^{\infty}$-norm of the CUSUM statistics. For the problem of testing for the existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data-dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when $p \gg n$. Once a change point is detected, we estimate the change point location by maximizing the $\ell^{\infty}$-norm of the generalized CUSUM statistics at two different weighting scales. The first estimator is based on the covariance stationary CUSUM statistics, and we prove its consistency in estimating the location at the nearly parametric rate $n^{-1/2}$ for sub-exponential observations. The second estimator is based on non-stationary CUSUM statistics, assigning less weights on the boundary data points. In the latter case, we show that it achieves the nearly best possible rate of convergence on the order $n^{-1}$. In both cases, dimension impacts the rate of convergence only through the logarithm factors, and therefore consistency of the CUSUM location estimators is possible when $p$ is much larger than $n$. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. The results derived are non-asymptotic and we provide extensive simulation studies to assess the finite sample performance. The empirical evidence shows an encouraging agreement with our theoretical results. In the second part, we analyze the problem of change point detection for high-dimensional distributions in a location family. We propose a robust, tuning-free (i.e., fully data-dependent), and easy-to-implement change point test formulated in the multivariate $U$-statistics framework with anti-symmetric and nonlinear kernels. It achieves the robust purpose in a non-parametric setting when CUSUM statistics are sensitive to outliers and heavy-tailed distributions. Specifically, the within-sample noise is canceled out by anti-symmetry of the kernel, while the signal distortion under certain nonlinear kernels can be controlled such that the between-sample change point signal is magnitude preserving. A (half) jackknife multiplier bootstrap (JMB) tailored to the change point detection setting is proposed to calibrate the distribution of our $\ell^{\infty}$-norm aggregated test statistic. Subject to mild moment conditions on kernels, we derive the uniform rates of convergence for the JMB to approximate the sampling distribution of the test statistic, and analyze its size and power properties. Extensions to multiple change point testing and estimation are discussed with illustration from numeric studies.
Issue Date:2020-06-26
Rights Information:Copyright 2020 Mengjia Yu
Date Available in IDEALS:2020-10-07
Date Deposited:2020-08

This item appears in the following Collection(s)

Item Statistics