Files in this item



application/pdfRIVAS-ASTROZA-DISSERTATION-2015.pdf (26MB)
(no description provided)PDF


Title:Bioinformatic methods for the analysis of genetic variability and regulation
Author(s):Rivas-Astroza, Marcelo Alejandro
Director of Research:Zhong, Sheng
Doctoral Committee Chair(s):Zhong, Sheng
Doctoral Committee Member(s):Jongeneel, Victor; Ma, Jian; Sinha, Saurabh
Department / Program:Bioengineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Histone Post-translational modifications at single-nucleosome resolution
Comparative Genomics
Personalized Genomes
Abstract:Next-generation sequencing (NGS) technologies have put within our reach several biological inquiries. In this thesis, we exploited NGS data to address questions related to how cell's genetic and epigenetic makeups give rise to phenotypic traits and regulatory processes. We started analyzing the relationship between genetic information and phenotype. On one hand, we used DNA-seq to compare the genomes of two behaviorally dissimilar Apis mellifera subspecies: Africanized honeybee (AHB) and European honeybee (EHB). Both types of bees are physically alike, but AHB show more intense and prolonged aggressive behavior when potential threats are detected nearby the colony. To determine the most divergent genes and promoters between AHB and EHB, we used fixed and polymorphic sites as a metric for between and within subspecies diversity, respectively. The most divergent genes and promoters were enriched on functions related to mitochondrial metabolism, and several promoters were directly linked to genes involved in synaptic processes. Whereas mitochondrial metabolism has been previously associated with behavioral traits, the presence of divergent mutations on the regulatory region of genes related to synaptic functions is coherent with the hypothesis that changes in mood are triggered by cognitive processes. On the other hand, we looked at transcription factor binding differences among paternal and maternal alleles of two human cell lines. For this, we created a computational tool, perEditor, that build personalized genomes based on known genetics differences between the human reference genome (hg18) and each cell lines. Using publicly available ChIP-seq data for the transcription factor NFkB, we found that the used of personalized genomes significantly affect the inferences of binding affinities. What is more, the mapping efficiency was significantly improved by the used of personalized genomes compared to the reference genome. Together, these findings substantiate the use of comparative genomics among (1) sub-species and (2) alleles to pinpoint genetic drivers of phenotypic diversity. Then, we moved to study the relationship between the epigenome, particularly histone post-translation modifications (PTM), and regulation of functional genomic elements. Histone marks control much of the workings of the genome. However, it remains unclear how different histone PTM combine within individual nucleosomes to encode regulatory information. Regular protocols of chromatin immunoprecipitation followed by sequencing (ChIP-seq) produce fragment sizes that can be several times the length of the DNA needed to wrap a single nucleosome, making them unsuitable to unambiguously trace histone PTM to individual nucleosomes. Alternative ChIP-seq protocols employing enzymatic digestion of DNA can overcome this limitation by producing mono-nucleosomal footprints of histone PTM. Still, care has to be taken to avoid co-founding effects stemming from the relation between nucleosomes co-localization levels and nucleosomal enrichment of histone marks. Here, we devised computational and statistical methods to exploit the power of enzymatic digestion of DNA coupled with ChIP-seq to generate genome-wide maps of histone marks at single-nucleosome resolution. In particular, we analyzed MNChIP-seq (short for MNase digestion of DNA coupled with ChIP-seq) libraries of H3K4me3, H3K27Ac, H3K9me3, and H3K27me3 on mouse embryonic stem cells. We started by asking what are the genome coordinates of each nucleosome in the mouse genome. To gain sensitivity, we combined all four MNChIP-seq libraries with publicly available MNase-seq data. Based on this strategy we pinpointed the positions of 10,292,808 nucleosomes. That is, one of the most comprehensive nucleosomal maps of the mouse genome, corresponding to 84% of the expected number of nucleosomes in the mouse genome. We computed the likelihood of a nucleosome of being marked by a histone mark as the quantile of its counts of MNChIP-seq fragments over the set of nucleosomes with the same number of MNase-seq fragments. Using as a lower-threshold the 95% quantile, we found 579 004, 591 998, 574 062, and 884 727 nucleosomes marked by H3K4me3, H3K27Ac, H3K27me3, and H3K9me3, respectively. Interestingly, 12,700 of these nucleosomes were not overlapped by any MNase-seq fragment but were discovered only in virtue of the reads coming from any of the four MNChIP-seq libraries, suggesting that they were not present or located elsewhere at the time of the MNase-seq sampling. Then, we asked what is the relation between the combinatorial patterns of histone marks within individual nucleosomes and various functional genomic elements. We found that a significant number of nucleosomes were marked by two or more of histone marks. Nucleosomes marked simultaneously by H3K4me3 and H3K27me3 were prevalent among bivalent domains compared to the genomic background. Nucleosomes immediately after the transcription start site of genes, or intron-exon junctions of alternatively splicing exons were significantly enriched by histone PTM compared to upstream and downstream nucleosomes, suggesting a position-specific effect in the encoding of regulatory information. Nucleosomes having the repressive marks H3K27me3 and H3K9me3 were enriched at the transcription starting site of highly active genes only if they were also co-localized with the activating mark H3K27Ac. Inclusion of alternatively spliced exons on the final mRNA was correlated with nucleosomes marked by H3K4me3, H3K27Ac, or H3K27me3, but was largely unaffected by nucleosomes marked by H3K9me3. Together, these findings put forward the idea that combinatorial patterns of histone PTM within individual nucleosomes are fundamental units of regulatory information.
Issue Date:2015-12-02
Rights Information:Copyright 2015 Marcelo Rivas-Astroza
Date Available in IDEALS:2016-03-02
Date Deposited:2015-12

This item appears in the following Collection(s)

Item Statistics