Files in this item



application/pdfDADU-THESIS-2021.pdf (5MB)
(no description provided)PDF


Title:Application of machine learning to the detection and prediction of Parkinson’s disease subtypes
Author(s):Dadu, Anant
Advisor(s):Campbell, Roy H.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Neurodegenerative Disorders, Machine Learning, Disease Progression
Abstract:Background: The clinical manifestations of Parkinson’s disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. As such, counseling of patients about their prognosis is guarded. Characterization of unique disease subtypes and enhanced, personalized disease course projections are both unmet needs. Machine learning’s ability to find hidden patterns in complicated, multi-dimensional datasets has opened up unprecedented possibilities for meeting this crucial demand. Methods and Findings: We used unsupervised and supervised machine learning methods on comprehensive, longitudinal clinical data from the Parkinson’s Disease Progression Marker Initiative (PPMI) (n = 294 cases) to identify patient subtypes and predict dis-ease progression. The resulting models were validated in an independent, clinically well-characterized cohort from the Parkinson’s Disease Biomarker Program (PDBP) (n = 263 cases). Our analysis distinguished three distinct disease subtypes with highly predictable progression rates, corresponding to slow, moderate, and fast disease progressors. In addition, we achieved highly accurate projections of disease progression five years after initial diagnosis with an average area under the curve (AUC) of 0.92 (95% Confidence Interval (CI): 0.95 ± 0.01 for slow PD progressors (PDvec1), 0.87 ± 0.03 for model PD progressors (PDvec2), and 0.95 ± 0.02 for fast disease progressors (PDvec3)). Then, we replicated these findings in an independent validation cohort, released the analytical code, and developed models in an open science manner. Finally, we found that daytime sleepiness has the highest importance in clinical progression, followed by hobbies, activities, and urinary problems. Furthermore, we observe that Serum neurofilament light levels and genetic risk score also play an essential role in predicting PD subtypes. Conclusions: These data-driven results could help deconstruct the heterogeneity of patients. This discovery could have immediate clinical trial consequences by increasing the discovery of significant clinical outcomes previously hidden due to cohort heterogeneity. Machine learning models are expected to improve patient counseling, clinical trial design, healthcare resource allocation, and, ultimately, personalized patient care. To achieve more powerful predictions, we propose that these datasets should include standardized phenotype collection and recording.
Issue Date:2021-07-19
Rights Information:Copyright 2021 Anant Dadu
Date Available in IDEALS:2022-01-12
Date Deposited:2021-08

This item appears in the following Collection(s)

Item Statistics