Files in this item

FilesDescriptionFormat

application/pdf

application/pdfKENDZIOR-THESIS-2019.pdf (7MB)
(no description provided)PDF

Description

Title:Structural variant calling by assembly in whole human genomes: Applications in hypoplastic left heart syndrome
Author(s):Kendzior, Matthew
Advisor(s):Hudson, Matthew E.
Contributor(s):Mainzer, Liudmila; Sinha, Saurabh
Department / Program:Crop Sciences
Discipline:Bioinformatics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):structural variant
de Bruijn graph
genome assembly
Abstract:Variant discovery in medical research typically involves alignment of short sequencing reads to the human reference genome. SNPs and small indels (variants less than 50 nucleotides) are the most common types of variants detected from alignments. Structural variation can be more difficult to detect from short-read alignments, and thus many software applications aimed at detecting structural variants from short read alignments have been developed. However, these almost all detect the presence of variation in a sample using expected mate-pair distances from read data, making them unable to determine the precise sequence of the variant genome at the specified locus. Also, reads from a structural variant allele might not even map to the reference, and will thus be lost during variant discovery from read alignment. A variant calling by assembly approach was used with the software Cortex-var for variant discovery in Hypoplastic Left Heart Syndrome (HLHS). This method circumvents many of the limitations of variants called from a reference alignment: unmapped reads will be included in a sample’s assembly, and variants up to thousands of nucleotides can be detected, with the full sample variant allele sequence predicted. HLHS is a complex disease, and existing research indicates evidence for a genetic cause. HLHS is thought to have multiple genetic causes, and a variety of variants, from SNPs to chromosomal-level defects in have been identified individuals with the disease. However, causative variants have only been identified in a few cases, suggesting that rare variants in a background of other causative variants contribute to the HLHS phenotype. The assembly-based approach was used to discover structural variants that are too large for detection by alignment-based methods aimed at detecting SNP and indels, and too small for detection with lab based methods such as those employing hybridization to arrays. Using WGS data from 24 family trios with an HLHS proband, and 344 control individuals from the Mayo Clinic Biobank with no family history of HLHS, whole genome de novo assemblies were performed for each individual, and variants were called in assembly graphs against a de Bruijn graph representation of the human reference sequence. For comparison, each individual was also ran through the Sentieon software implementation of the Genome Analysis Toolkit Best Practices, and structural variants were also called from alignments with Sentieon DNAscope, a high-performance conventional structural variant caller. This approach has identified genes related to embryonic development where structural variants are significantly overrepresented in HLHS probands versus the Biobank control population.
Issue Date:2019-10-11
Type:Text
URI:http://hdl.handle.net/2142/106172
Rights Information:Copyright 2019 Matthew Kendzior
Date Available in IDEALS:2020-03-02
Date Deposited:2019-12


This item appears in the following Collection(s)

Item Statistics