Files in this item



application/pdfKINSON-DISSERTATION-2017.pdf (696kB)
(no description provided)PDF


Title:Longitudinal principal components analysis for binary and continuous data
Author(s):Kinson, Christopher Leron
Director of Research:Qu, Annie
Doctoral Committee Chair(s):Qu, Annie
Doctoral Committee Member(s):Culpepper, Steven; Marden, John; Simpson, Douglas
Department / Program:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Non-parametric spline
Odds ratio
Abstract:Large-scale data or big data is an enormously popular word in the data science and statistics communities. These datasets are often collected over periods of time - at hourly and weekly rates - with the help of technological advancements in physical and cloud-based storage. The information stored is useful, especially in biomedicine, insurance, and retail, where patients and customers are crucial to business survival. In this thesis, we develop new statistical methodologies for handling two types of datasets: continuous data and binary data. Time-varying associations among store products provide important information to capture changes in consumer shopping behavior. In the first part of this thesis, we propose a longitudinal principal component analysis (LPCA) using a random-effects eigen-decomposition, where the eigen-decomposition utilizes longitudinal information over time to model time-varying eigenvalues and eigenvectors of the corresponding covariance matrices. Our method can effectively analyze large marketing data containing sales information for selected consumer products from hundreds of stores over an 11-year time period. The proposed method leads to more accurate estimation and interpretation compared to comparable approaches, which is illustrated through finite sample simulations. We show our method's capabilities and provide an interpretation of the eigenvector estimates in an application to IRI marketing data. In the second part of this thesis, we formulate the LPCA problem for binary data. We propose capturing the associations among the products or variables through the odds ratios, where a two by two contingency table contains probabilities representing the joint distribution of two binary products. The eigen-decomposition utilizes longitudinal information over time to model time-varying eigenvalues and eigenvectors of the corresponding odds ratio matrices. These odds ratio matrices measure the pairwise associations among the binary products and is more appropriate to use than the Pearson correlation coefficient. Our method illustrates an improvement in visualization and interpretation through simulation studies and an application to IRI panel data of individual customer purchases.
Issue Date:2017-07-12
Rights Information:Copyright 2017 Christopher Kinson
Date Available in IDEALS:2017-09-29
Date Deposited:2017-08

This item appears in the following Collection(s)

Item Statistics