Files in this item



application/pdfPENG-THESIS-2019.pdf (4MB)Restricted Access
(no description provided)PDF


application/pdfSupplementary.pdf (13MB)Restricted Access
(no description provided)PDF


Title:E2M: A deep learning framework for associating combinatorial methylation patterns with gene expression
Author(s):Peng, Jianhao
Advisor(s):Ochoa, Idoia; Milenkovic, Olgica
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Gene expression
Inception network
Quantized neural network
Abstract:We focus on the new problem of determining which methylation patterns in gene promoters strongly associate with gene expression in cancer cells of different types. Although a number of results regarding the influence of methylation on expression data have been reported in the literature, our approach is unique insofar as it retrospectively predicts the combinations of methylated sites in promoter regions of genes that are reflected in the expression data. Reversing the traditional prediction order in many cases makes estimation of the model parameters easier, as real-valued data are used to predict categorical data, rather than vice-versa; in addition, our approach allows one to better assess the overall influence of methylation in modulating expression via state-of-the-art learning methods. For this purpose, we developed a novel neural network learning framework termed E2M (Expression-to- Methylation) to predict the status of different methylation sites in promoter regions of several bio-marker genes based on sufficient statistics of the whole gene expression captured through Landmark genes. We ran our experiments on unquantized and quantized expression sets and neural network weights to illustrate the robustness of the method and reduce the storage footprint of the processing pipeline. We implemented a number of machine learning algorithms to address the new problem of methylation pattern inference, including multiclass regression, canonical correlation analysis (CCA), naive fully connected neural network and inception neural networks. Inception neural networks such as E2M learners outperform all other techniques and offer an average prediction accuracy of 82% when tested on 3, 671 pan-cancer samples including low grade glioma, glioblastoma, lung adenocarcinoma, lung squamus cell carcinoma, and stomach adenocarcinoma. As an illustrative example, one can increase the prediction accuracy for the methylation pattern in the promoter of gene GATA6 in glioblastoma samples by 20% when using inception rather than simple fully connected neural networks. These performance guarantees remain largely unchanged even when both expression values and network weights are quantized. Our work also provides new insight about the importance of specific methylation site patterns on expression variations for different genes. In this context, we identified genes for which the overwhelming majority of patients exhibit one methylation pattern, and other genes with three or more significant classes of methylation patterns. Inception networks identify such patterns with high accuracy and suggest possible stratification of cancers based on methylation pattern profiles. The E2M code and datasets are freely available at
Issue Date:2019-03-29
Rights Information:Copyright 2019 Jianhao Peng
Date Available in IDEALS:2019-08-23
Date Deposited:2019-05

This item appears in the following Collection(s)

Item Statistics