Files in this item

FilesDescriptionFormat

application/pdf

application/pdfLing_Xu.pdf (2MB)
(no description provided)PDF

Description

Title:Computational Prediction of Functional Elements through Comparative Genomics
Author(s):Ling, Xu
Director of Research:Zhai, ChengXiang; Sinha, Saurabh
Doctoral Committee Chair(s):Zhai, ChengXiang
Doctoral Committee Member(s):Sinha, Saurabh; Schatz, Bruce R.; Blanchette, Mathieu
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Bioinformatics
Comparative Genomics
Cis-regulatory Elements
Conserved Gene Clusters
Probabilistic Model
Algorithm
Abstract:Understanding the evolution and organization of the genomic functional elements is one of the most important goals of genomic studies. The complexity of the functional information encoded in the genome sequences and the variabilities of the manners of encoding the information make it a very challenging task. Nucleotides mutations and genome-wide re-arrangements bring additional great challenges in identification and understanding of the functional elements in the genome. On the other hand, due to natural selection, functional sequences tend to evolve at a slower rate than non-functional sequences. Therefore, the conservation pattern across species often indicates where functional sequences are located. With the increasing number of species being sequenced, comparative genomes, which compares the sequences from multiple species at varying evolutionary distances, has now merged as a very powerful approach for identifying variety types of functional elements, such as protein coding genes, transcriptional regulatory sequences, and non-coding RNA genes. This dissertation research has been focused on two grand challenges of genomics: (i) to decode cis-regulatory modules (CRMs), non-coding DNA sequences controlling gene expression; and (ii) to discover gene groups that are functionally related. For both lines of work, the key idea is to leverage the power of comparative genomics in decoding the genomic information. The first part of this thesis developed a probabilistic framework for CRM prediction. This framework is based on a probabilistic model of CRM evolution, which captures the content feature of regulatory sequences as well as their dynamic process of evolution. This model advances the previous models by dealing with the inherent uncertainties of transcription factor binding site (TFBS) annotations in a probabilistic framework, as partially conserved binding site has been recognized as an important aspect of regulatory sequence evolution. we explicitly model the two stochastic process of loss of existing TFBSs and TFBS gain from background nucleotides, to leverage the power of comparative genomics for CRM prediction, while at the same time utilize the information of this lineage-specific pattern. The second part of this thesis focuses on discovering functionally related gene groups. Understanding how genes are organized in the genomes and what information is encoded in genomic contexts is one of the fundamental problems in genomics. During evolution, the gene order is generally not well conserved because of the rapid rearrangement events that reshuffle genomes. On the other hand, functionally related genes may be constrained to remain close to each other due to natural selection, forming so called conserved gene clusters. Conservation of spatial organization of genes provides an important source of information that is orthogonal to primary sequences of genes and thus could be exploited to supplement our existing genomic analysis tools. In this thesis, we developed a highly efficient algorithm to discover conserved gene clusters across multiple genomes. These gene clusters are likely under some evolutionary constraint and indicate functional relationship among the genes within a cluster. Our algorithm advances existing work by allowing genes in the clusters to appear in different orders and at the same time making the computation orders of magnitude faster. This allows us to detect conserved gene clusters under flexible evolutionary constraints in a large number of genomes. In addition, we developed a statistical evaluation method, which incorporates the evolutionary relationship among genomes, a key aspect that has been missing in most previous studies. The combined algorithmic and statistical methods provide a rigorous framework for systematically studying evolutionary constraints of genomic contexts.
Issue Date:2010-01-06
URI:http://hdl.handle.net/2142/14586
Rights Information:Copyright 2009 Xu Ling
Date Available in IDEALS:2010-01-06
Date Deposited:December 2


This item appears in the following Collection(s)

Item Statistics