Files in this item



application/pdfLEU-THESIS-2021.pdf (8MB)Restricted Access
(no description provided)PDF


Title:Survival analysis for lung cancer patients
Author(s):Leu, Anh
Advisor(s):Do, Minh
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):survival analysis
lung cancer
lung cancer prognosis
Cox proportional hazards
Meier estimator
random survival forests
machine learning
Abstract:Cancer is one of the leading causes of death. Lung cancer, in particular, is the leading cause of cancer death in both men and women, accounting for 23% of all cancer deaths in 2019 according to the Centers for Disease Control and Prevention. One particular problem with lung cancer is that it usually has a poor prognosis, with a five years survival rate of only 21% according to the SEER Cancer Statistics Review, 1975-2017. With such a deadly disease, it is crucial to predict the survival likelihood of cancer patients. However, this is not an easy task due to the many factors affecting the disease progression. This thesis is based on the existing National Lung Screening Trial (NLST) dataset and provides in-depth analysis of different features influencing lung cancer prognosis. We added nodule annotations to the NLST dataset and extracted radiomic features from each nodule. Using the newly acquired radiomic features, coupled with the existing clinical data from the original NLST dataset, we examined different prognostic models to predict the event of death by lung cancer from the first low-dose computed tomography (LDCT) scan. The model using both clinical and radiomic features shows relative performance improvements compared to the models using only the clinical information, signifying the importance of additional radiomic features. While the best model's concordance index using clinical input is 0.589, the concordance index of the best model using a combination of clinical and radiomic features is 0.657. We performed rigorous cross-examination on each feature's relationship and the model for each feature type using data analysis information and survival analysis models. For each feature type, we used one representative survival analysis model from semi-parametric methods (Cox proportional hazards model), one from non-parametric methods (Kaplan-Meier estimator), and one from machine learning approaches (random survival forests). Using the results obtained from these different methods, we identified the best feature types and model combinations to get the top performance for various follow-up periods. The best model is random survival forests with a combination of clinical and radiomic features as input. Roughly 330 days after the first scan, the combination model achieves a 30-day mean cumulative/dynamic area under the receiver operating characteristic of approximately 0.8 for about one year, peaking 810 days after the first scan at 0.839.
Issue Date:2021-04-27
Rights Information:copyright Anh Leu 2021
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05

This item appears in the following Collection(s)

Item Statistics