Files in this item

FilesDescriptionFormat

application/pdf

application/pdfLing_Xu.pdf (2MB)
Main articlePDF

Description

Title:Computational Prediction of Functional Elements through Comparative Genomics
Author(s):Ling, Xu
Contributor(s):Sinha, Saurabh; Zhai, ChengXiang; Schatz, Bruce R.; Blanchette, Mathieu
Subject(s):bioinformatics
comparative genomics
cis-regulatory sequence
conserved gene cluster
Abstract:Understanding the evolution and organization of the genomic functional elements is one of the most important goals of genomic studies. The complexity of the functional information encoded in the genome sequences and the variabilities of the manners of encoding the information make it a very challenging task. Nucleotides mutations and genome-wide re-arrangements bring additional great challenges in identification and understanding of the functional elements in the genome. On the other hand, due to natural selection, functional sequences tend to evolve at a slower rate than non-functional sequences. Therefore, the conservation pattern across species often indicates where functional sequences are located. With the increasing number of species being sequenced, comparative genomes, which compares the sequences from multiple species at varying evolutionary distances, has now merged as a very powerful approach for identifying variety types of functional elements, such as protein coding genes, transcriptional regulatory sequences, and non-coding RNA genes. This dissertation research has been focused on two grand challenges of genomics: (i) to decode cis-regulatory modules (CRMs), non-coding DNA sequences controlling gene expression; and (ii) to discover gene groups that are functionally related. For both lines of work, the key idea is to leverage the power of comparative genomics in decoding the genomic information. The first part of this thesis developed a probabilistic framework for CRM prediction. This framework is based on a probabilistic model of CRM evolution, which captures the content feature of regulatory sequences as well as their dynamic process of evolution. This model advances the previous models by dealing with the inherent uncertainties of transcription factor binding site (TFBS) annotations in a probabilistic framework, as partially conserved binding site has been recognized as an important aspect of regulatory sequence evolution. we explicitly model the two stochastic process of loss of existing TFBSs and TFBS gain from background nucleotides, to leverage the power of comparative genomics for CRM prediction, while at the same time utilize the information of this lineage-specific pattern. The second part of this thesis focuses on discovering functionally related gene groups. Understanding how genes are organized in the genomes and what information is encoded in genomic contexts is one of the fundamental problems in genomics. During evolution, the gene order is generally not well conserved because of the rapid rearrangement events that reshuffle genomes. On the other hand, functionally related genes may be constrained to remain close to each other due to natural selection, forming so called "conserved gene clusters". Conservation of spatial organization of genes provides an important source of information that is orthogonal to primary sequences of genes and thus could be exploited to supplement our existing genomic analysis tools. In this thesis, we developed a highly efficient algorithm to discover conserved gene clusters across multiple genomes. These gene clusters are likely under some evolutionary constraint and indicate functional relationship among the genes within a cluster. Our algorithm advances existing work by allowing genes in the clusters to appear in different orders and at the same time making the computation orders of magnitude faster. This allows us to detect conserved gene clusters under flexible evolutionary constraints in a large number of genomes. In addition, we developed a statistical evaluation method, which incorporates the evolutionary relationship among genomes, a key aspect that has been missing in most previous studies. The combined algorithmic and statistical methods provide a rigorous framework for systematically studying evolutionary constraints of genomic contexts.
Issue Date:2009-12
Genre:Dissertation / Thesis
Type:Text
Language:English
URI:http://hdl.handle.net/2142/14457
Publication Status:unpublished
Peer Reviewed:not peer reviewed
Date Available in IDEALS:2009-12-15


This item appears in the following Collection(s)

Item Statistics