Files in this item



application/pdfAbdol Majid_Kazemian.pdf (12MB)
(no description provided)PDF


Title:Cis-regulatory module analysis: inferring regulatory networks and underlying mechanisms
Author(s):Kazemian, Abdol Majid
Director of Research:Sinha, Saurabh
Doctoral Committee Chair(s):Sinha, Saurabh
Doctoral Committee Member(s):Stubbs, Lisa J.; Brodsky, Michael; Zhai, ChengXiang; Ma, Jian
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Cis-regulatory module
transcription factor
transcription factor (TF) interaction
interacting TF signatures (iTFs)
Abstract:A major challenge in understanding metazoan genomes is to find and annotate the regions that control the precise spatial and temporal expression of the genes. Cis- regulatory modules (CRMs/enhancers), main players of this regulatory process, are typically short (<1kb) sequences that are embedded in non-coding regions of the genome. They harbor cis-elements (binding sites/motifs) for one or more related transcription factors (TFs) and mediate a discrete aspect of the expression pattern of their nearby gene. Although decades of research in biology have provided scientists with hundreds of such sequences, we are far from completing the search and understanding the underlying mechanisms of these regulatory regions. The goal of this thesis is to utilize computational and statistical methods to guide the search for novel CRMs, reveal the mechanisms of this regulatory action, and elucidate specific biological networks using the developed methodology. The first part of my thesis develops several statistical methods to find novel enhancers using the existing enhancers as training data. The current computational enhancer prediction methods rely on the prior knowledge of relevant transcription factors. We introduce a novel computational paradigm to enhancer discovery in the common scenario where relevant transcription factors and/or motifs are unknown. Beginning with a small set of enhancers mediating a common gene expression pattern, our methods search genome-wide for enhancers with similar functionality. Our methods employ word- based statistical and machine learning techniques and do not require (or rely on) known motifs or accurate motif discovery. We use these approaches to a wide range of less- studied networks in fruit fly and mouse. The second part my thesis develops a qualitative model to predict the function of enhancers. A long-standing question in transcriptional gene regulation is how a gene’s sequence encodes its expression (function). In fruit flies, the segmentation of their body plan over the anterior-posterior (A/P) axis is achieved through a well-characterized transcriptional regulatory network that consists of several known enhancers. Using these enhancers as training data, we learn a generalized linear model that combines the relevant TF occupancies (the product of TF binding strength with their corresponding ii concentrations) to predict their function. We show that this model can capture the physical roles (activation or repression) of transcription factors as well as predict enhancer function. We use this model to scan the fly genome for segments that drive an A/P pattern similar to that of their neighboring genes and construct a quantitative network of fruit fly embryo anteroposterior patterning. The third part of my thesis develops a model to simultaneously locate the enhancers and annotate the expression pattern driven by them. The model does not rely on already characterized enhancers. Thus in a sense, it can be thought of as an extension to the second project where the knowledge of enhancers was available. The model iteratively samples a “more reliable” set of enhancers from a large pool of computationally predicted enhancers and re-learns a “more reliable” logistic regression model from these enhancers, ready to be used in the next iteration of enhancer sampling. In other words, by defining an objective function as “how well enhancers recapitulate one or more aspects of their nearby gene expression pattern”, we iteratively sample from a collection of candidate enhancers to maximize this objective function. The last part of my thesis develops a statistical framework for finding sequence signatures of TF-TF interaction. We search for two types of sequence signatures: overlap/depletion among the bound regions of pair of transcription factors, and orientation and/or distance bias among transcription factor binding sites. These sequence signatures explain various distinct mechanisms of combinatorial gene regulation, such as protein-protein interaction, short-range repression, and co-regulation. These signatures as a set of informative features can also advance the methods for discovering enhancers and predicting their functions. We use our framework to search genome-wide for these signatures among a large collection of characterized TFs (>300) in fruit fly.
Issue Date:2013-02-03
Rights Information:Copyright 2012 Abdol Majid Kazemian
Date Available in IDEALS:2013-02-03
Date Deposited:2012-12

This item appears in the following Collection(s)

Item Statistics