Files in this item

FilesDescriptionFormat

application/pdf

application/pdfHUANG-DISSERTATION-2019.pdf (2MB)
(no description provided)PDF

Description

Title:Integrating heterogeneous data into electronic medical record analysis
Author(s):Huang, Edward W.
Director of Research:Zhai, ChengXiang
Doctoral Committee Chair(s):Zhai, ChengXiang
Doctoral Committee Member(s):Farnoud, Farzad; Campbell, Roy H.; Peng, Jian; Sinha, Saurabh
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Electronic medical records
data mining
knowledge graph
heterogeneous data
Abstract:Electronic medical records (EMRs) are the digital equivalent of paper records at a clinician's office. They contain patient information such as treatment and medical history, and have been shown to have a wide variety of benefits. However, EMRs typically contain a multitude of diverse data, including images, doctor notes, medical test results, and genomic data. This heterogeneity generates high dimensionality and data sparsity, which are two of the most prevalent culprits that exacerbate already difficult computational problems. Additionally, domain-specific characteristics, such as the existence of synonyms in the medical vocabulary, introduce ambiguity. This can further reduce the data mining potential of EMRs. This thesis is a systematic study that addresses these issues associated with EMRs. In particular, I utilized heterogeneous data sources that are typically incompatible, and then developed frameworks in which these data sources complement one another. As a result, these methods have the potential for direct clinical translation, paving the way for improving healthcare from a data-driven perspective. To improve a variety of downstream healthcare applications, such as patient subcategorization, survival analysis, and visualization, I used external networks of domain knowledge consisting of drug-symptom relationships, protein-protein interactions, and genetic information to enhance patient records. I found that this enhancement process increased the data mining capabilities as well as the interpretability of the EMRs. To improve EMR retrieval systems, I developed a query expansion method that frames symptoms and treatments as two different languages. I found that a topic modeling method that follows this dual-language framework yielded the highest performance. Lastly, I showed that due to pathological similarities, jointly studying Alzheimer's disease and Parkinson's disease resulted in higher computational power by effectively increasing the size of the training datasets. This allowed for the accurate prediction of the onset of dementia in both diseases. Each of these results can lay the groundwork for applications that have the potential to be implemented directly in clinical practice, improving the safety and quality of patient care.
Issue Date:2019-04-16
Type:Text
URI:http://hdl.handle.net/2142/104778
Rights Information:Copyright 2019 Edward W Huang
Date Available in IDEALS:2019-08-23
Date Deposited:2019-05


This item appears in the following Collection(s)

Item Statistics