Files in this item



application/pdfKim_Jaebum.pdf (2MB)
(no description provided)PDF


Title:Probabilistic Model-Based Approach to Evolutionary Analysis of Non-Coding Sequences
Author(s):Kim, Jaebum
Director of Research:Sinha, Saurabh
Doctoral Committee Chair(s):Sinha, Saurabh
Doctoral Committee Member(s):Han, Jiawei; Zhai, ChengXiang; Ma, Jian
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Multiple sequence alignment
Probabilistic model
Insertions and deletions
Simulation-based benchmark
Regulatory sequences
Sequence evolution
Abstract:Non-coding sequences, constituting a large fraction of genomic DNA, are of great importance because (i) they harbor functional elements that are involved in the regulation of gene expression and (ii) they are essential for the study of genome structure and evolution. The availability of genome sequences of closely related species has provided opportunities to analyze non-coding sequences by comparing multiple genomes from different species. The success of comparative genomic studies relies on bioinformatics tools that aid the comparison and analysis of genome sequences. Here, we propose and develop computational tools to evolutionarily analyze non-coding sequences, which are based on probabilistic models of sequence evolution. We present a probabilistic framework for finding the locations of insertions and deletions (indels) in a multiple alignment. Its performance is found to be better than that obtained by a parsimony-based method. We study the evolution of sequences involved in the regulation of body patterning in the Drosophila embryo, reporting statistical evidence in favor of key evolutionary hypotheses related to regulatory elements and constraints on indels. We also propose a new simulation scheme for generating biologically realistic benchmarks for the alignments of non-coding sequences. This scheme is used to construct benchmarks for Drosophila non-coding sequences, and evaluation results are shown for several multiple alignment and indel annotation tools on those benchmarks. Finally, we develop a probabilistic framework for multiple sequence alignment that finds an optimal alignment by incrementally building up alignment columns, based on a model for the evolution of three sequences and the joint probability of an alignment column as a substitute for the traditionally used sum-of-pairs score. We find that the new framework produces alignments of much greater specificity than state-of-the-art methods, without compromising too much in terms of sensitivity. The computational tools developed here will play a significant role in solving many biological problems and further contribute to broaden our understanding of organismal diversity and evolution.
Issue Date:2010-08-31
Rights Information:Copyright 2010 Jaebum Kim
Date Available in IDEALS:2010-08-31
Date Deposited:2010-08

This item appears in the following Collection(s)

Item Statistics