Files in this item



application/pdfHyun Keun_Cho.pdf (477kB)
(no description provided)PDF


Title:Model selection for correlated data and moment selection from high-dimensional moment conditions
Author(s):Cho, Hyun Keun
Director of Research:Qu, Annie
Doctoral Committee Chair(s):Qu, Annie
Doctoral Committee Member(s):Douglas, Jeffrey A.; Shao, Xiaofeng; Simpson, Douglas G.
Department / Program:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):diverging number of parameters
dynamic panel data models
generalized method of moments
high-dimensional moment conditions
moment selection
Longitudinal data
model selection
oracle property
quadratic inference function
smoothly clipped absolute deviation (SCAD)
singularity matrix
Abstract:High-dimensional correlated data arise frequently in many studies. My primary research interests lie broadly in statistical methodology for correlated data such as longitudinal data and panel data. In this thesis, we address two important but challenging issues: model selection for correlated data with diverging number of parameters and consistent moment selection from high-dimensional moment conditions. Longitudinal data arise frequently in biomedical and genomic research where repeated measurements within subjects are correlated. It is important to select relevant covariates when the dimension of the parameters diverges as the sample size increases. We propose the penalized quadratic inference function to perform model selection and estimation simultaneously in the framework of a diverging number of regression parameters. The penalized quadratic inference function can easily take correlation information from clustered data into account, yet it does not require specifying the likelihood function. This is advantageous compared to existing model selection methods for discrete data with large cluster size. In addition, the proposed approach enjoys the oracle property; it is able to identify non-zero components consistently with probability tending to 1, and any finite linear combination of the estimated non-zero components has an asymptotic normal distribution. We propose an efficient algorithm by selecting an effective tuning parameter to solve the penalized quadratic inference function. Monte Carlo simulation studies have the proposed method selecting the correct model with a high frequency and estimating covariate effects accurately even when the dimension of parameters is high. We illustrate the proposed approach by analyzing periodontal disease data. The generalized method of moments (GMM) approach combines moment conditions optimally to obtain efficient estimation without specifying the full likelihood function. However, the GMM estimator could be infeasible when the number of moment conditions exceeds the sample size. This research intends to address issues arising from the motivating problem where the dimension of estimating equations or moment conditions far exceeds the sample size, such as in selecting informative correlation structure or modeling for dynamic panel data. We propose a Bayesian information type of criterion to select the optimal number of linear combinations of moment conditions. In theory, we show that the proposed criterion leads to consistent selection of the number of principal components for the weighting matrix in the GMM. Monte Carlo studies indicate that the proposed method outperforms existing methods in the sense of reducing bias and improving the efficiency of estimation. We also illustrate a real data example for moment selection using dynamic panel data models.
Issue Date:2013-08-22
Rights Information:Copyright 2013 Hyun Keun Cho
Date Available in IDEALS:2013-08-22
Date Deposited:2013-08

This item appears in the following Collection(s)

Item Statistics