Files in this item



application/pdfBOIARSKAIA-DISSERTATION-2016.pdf (3MB)
(no description provided)PDF


Title:Recognizing cardiovascular disease patterns with machine learning using NHANES accelerometer determined physical activity data
Author(s):Boiarskaia, Elena
Director of Research:Zhu, Weimo
Doctoral Committee Chair(s):Zhu, Weimo
Doctoral Committee Member(s):Buchner, David; Liang, Feng; Wilund, Kenneth
Department / Program:Kinesiology & Community Health
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Machine learning
physical activity recommendations
cardiovascular disease risk
Reynolds risk score
classification algorithms
feature selection
random forest
decision tree
support vector machine
lasso regression
neural network
Abstract:The relationship between physical activity (PA) and cardiovascular disease (CVD) is well established; however, questions about the appropriate dose of PA to reduce CVD risk still remain (Blair, LaMonte, & Nichaman, 2004; Pate et al., 1995). The optimal dose and the effects of intensity, duration, and frequency of PA are not fully understood (Haskell et al., 2007). This study connects objectively measured PA with a cross-sectional measure of CVD risk for an in-depth analysis of PA patterns that contribute to higher risk of CVD. Specifically, this study applied machine learning algorithms to NHANES accelerometer data from the 2003-2006 cohorts with the Reynolds cardiovascular risk score as the outcome. Using accelerometer data as a proxy for the Reynold's risk score to study cardiovascular disease risk allows the use of cross-sectional data when the longitudinal outcome is not known. A major benefit of using accelerometers to objectively measure of PA is that the data is easy and inexpensive to obtain. Furthermore, most locomotive activities are measured with a high degree of accuracy. Accelerometers can gather highly detailed information about an individual’s PA pattern over extended periods of time. This produces a large amount of data that requires specialized techniques to analyze. The analysis for this study was conducted using a variety of machine learning techniques to identify individual patterns in the data and evaluate what contributes most to high CVD risk. Comparison of machine learning algorithms shows that all classifiers perform well when given appropriate features. Using predefined intensity thresholds to compute average time spent in a PA category yielded good classification results in identifying study participants at high and low risk for CVD (Troiano et al., 2008). Adding PA pattern-related features to the model did not appear to improve classification. Features derived using k-means and the Hidden Markov Model (HMM) performed on the level of using predefined intensity thresholds, indicating that data driven methods may be used for feature extraction without relying on prior knowledge of the data. In general, the lasso regression, support vector machines (SVM) and random forest (RF) classifiers all performed well on large sets of data-driven features, achieving greater than 82% classification accuracy when time spent in PA intensity categories was combined with k-means and HMM-derived inputs. Neural networks performed well on smaller uncorrelated feature sets, and decision trees produced consistent results with the most transparency and interpretability. With respect to physical activity recommendations, the findings indicate that gender and time spent in lifestyle minutes (760-2019 intensity counts) play a key role in classifying CVD risk. Thus, a greater emphasis on gender specific recommendations focusing on lifestyle minutes in addition to moderate and vigorous activity may be necessary. Furthermore, time spent in the activity categories, not how PA is spread throughout the day and week appear to be most important for classification of CVD risk.
Issue Date:2016-07-12
Rights Information:Copyright 2016 Elena Boiarskaia
Date Available in IDEALS:2016-11-10
Date Deposited:2016-08

This item appears in the following Collection(s)

Item Statistics