Files in this item
Files | Description | Format |
---|---|---|
application/pdf ![]() ![]() | (no description provided) |
Description
Title: | Survival analysis for lung cancer patients |
Author(s): | Leu, Anh |
Advisor(s): | Do, Minh |
Department / Program: | Electrical & Computer Eng |
Discipline: | Electrical & Computer Engr |
Degree Granting Institution: | University of Illinois at Urbana-Champaign |
Degree: | M.S. |
Genre: | Thesis |
Subject(s): | survival analysis
lung cancer lung cancer prognosis Cox proportional hazards Meier estimator random survival forests machine learning |
Abstract: | Cancer is one of the leading causes of death. Lung cancer, in particular, is the leading cause of cancer death in both men and women, accounting for 23% of all cancer deaths in 2019 according to the Centers for Disease Control and Prevention. One particular problem with lung cancer is that it usually has a poor prognosis, with a five years survival rate of only 21% according to the SEER Cancer Statistics Review, 1975-2017. With such a deadly disease, it is crucial to predict the survival likelihood of cancer patients. However, this is not an easy task due to the many factors affecting the disease progression. This thesis is based on the existing National Lung Screening Trial (NLST) dataset and provides in-depth analysis of different features influencing lung cancer prognosis. We added nodule annotations to the NLST dataset and extracted radiomic features from each nodule. Using the newly acquired radiomic features, coupled with the existing clinical data from the original NLST dataset, we examined different prognostic models to predict the event of death by lung cancer from the first low-dose computed tomography (LDCT) scan. The model using both clinical and radiomic features shows relative performance improvements compared to the models using only the clinical information, signifying the importance of additional radiomic features. While the best model's concordance index using clinical input is 0.589, the concordance index of the best model using a combination of clinical and radiomic features is 0.657. We performed rigorous cross-examination on each feature's relationship and the model for each feature type using data analysis information and survival analysis models. For each feature type, we used one representative survival analysis model from semi-parametric methods (Cox proportional hazards model), one from non-parametric methods (Kaplan-Meier estimator), and one from machine learning approaches (random survival forests). Using the results obtained from these different methods, we identified the best feature types and model combinations to get the top performance for various follow-up periods. The best model is random survival forests with a combination of clinical and radiomic features as input. Roughly 330 days after the first scan, the combination model achieves a 30-day mean cumulative/dynamic area under the receiver operating characteristic of approximately 0.8 for about one year, peaking 810 days after the first scan at 0.839. |
Issue Date: | 2021-04-27 |
Type: | Thesis |
URI: | http://hdl.handle.net/2142/110853 |
Rights Information: | copyright Anh Leu 2021 |
Date Available in IDEALS: | 2021-09-17 |
Date Deposited: | 2021-05 |
This item appears in the following Collection(s)
-
Dissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer Engineering -
Graduate Dissertations and Theses at Illinois
Graduate Theses and Dissertations at Illinois