Files in this item



application/pdfXie_Dan.pdf (10MB)
(no description provided)PDF


Title:Applying integrative computational models to study the evolution of gene regulation
Author(s):Xie, Dan
Director of Research:Zhong, Sheng
Doctoral Committee Chair(s):Zhong, Sheng
Doctoral Committee Member(s):Stubbs, Lisa J.; Jakobsson, Eric; Price, Nathan D.; Ma, Jian
Department / Program:Bioengineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Systems biology
Gene Regulatory Network
Gene expression
Transcription factor binding site
Histone modification
Deoxyribonucleic Acid (DNA) Methylation
Embryonic stem cells
Abstract:Gene regulatory networks dynamically control the expression levels of all the genes, and are the keys in explaining various phenotypes and biological processes. The advance of high-throughput measurement technology, such as microarray and next-generation sequencing, enabled us to globally scrutinize various cell properties related to gene regulation and build statistical models to make quantitative predictions. The evolutionary process has left all kinds of traces in the current biological systems. The study of the evolution of gene regulatory networks in comparable cell types across species is an efficient method to unravel such evolutionary traces and help us to better understand the regulatory mechanism. The two main themes of my research are: analysing various "omics" data in the evolutionary context to identify conservation and changes in gene regulatory networks; and building computational models to incorporate different "omics" data for the annotation of genomes and prediction of evolution in gene regulation. The second chapter of my thesis described a computational algorithm for de novo prediction of transcription factor binding site motifs in multiple species. The algorithm, named "GibbsModule", uses three information sources to improve the prediction power, which are 1)co-expressed genes sharing the same set of motifs; 2)binding sites co-localizing to form modules; and 3)the conservation for the use of motifs across species. We developed a Gibbs sampling procedure to incorporate the three information sources. GibbsModule out-performed the existing algorithms on several synthetic and real datasets. When applied to study the binding regions of KLF in embryonic stem cells, GibbsModule discovered a new functional motif. We also used ChIP followed by qPCR to demonstrate that the binding affinity of GibbsModule predicted binding sites are stronger than non-predicted motifs. Both genome sequence and gene expression carry information about gene regulation. Therefore, we can learn more about gene regulatory networks by jointly analysing sequence and expression data. In the third chapter of my thesis, we first introduced a comparative study of the pre-implantation process of embryos in three mammalian species: human, mouse, and cow. We measured time course expression profiles of the embryos during the early development, and analysed them together with genome sequence data and ChIP-seq data. We observed a large portion of changed homologous gene expression, suggesting a prevalent rewiring of gene regulation. We associated the changes of gene expression with different types of cis-changes on the genome sequences. Especially, we found about 10% of species specific transposons are carrying multiple functional binding sites, which are likely to explain the evolution of gene expression. The second part of this chapter presented a phylogenetic model that incorporated the change of motif use and gene expression to infer the rewiring of gene regulatory networks. Epi-genetic modifications, including histone modifications and DNA methylation, are known to be associated with gene regulation. In chapter four, we studied the evolution of epi-genomes in pluripotent stem cells of human, mice, and pigs. We observed the conservation of epi-genomes in different categories of genomic regions. We found the evidence of positive and negative selections on the evolution of epi-genomes. Using linear regression models, the evolution of epi-genomes can largely explain the evolution of gene expression. In the second part of this chapter, we introduced a statistical model to describe the evolution of genomes considering both the DNA sequences and epi-genetic modifications. Based on the evolutionary model, we improved the current alignment algorithm with the information of epi-genetic modification distributions.
Issue Date:2011-08-26
Rights Information:Copyright 2011 Dan Xie
Date Available in IDEALS:2011-08-26
Date Deposited:2011-08

This item appears in the following Collection(s)

Item Statistics