Files in this item

FilesDescriptionFormat

application/pdf

application/pdfMAYS-THESIS-2020.pdf (3MB)
(no description provided)PDF

Description

Title:Navigating through the uncertainty of genotyping-by-sequencing data in polyploids
Author(s):Mays, Wittney Debora
Advisor(s):Sacks, Erik J
Contributor(s):Clark, Lindsay V; Ming, Ray; Lipka, Alexander E
Department / Program:Crop Sciences
Discipline:Bioinformatics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):bioinformatics
variant calling
polyploidy
genotyping-by-sequencing
GBS
Abstract:The development of genotyping-by-sequencing (GBS) methods has facilitated genomics studies in non-model species, including polyploids. Variant and genotype calling methods have been established for autopolyploids but for a species with a complex genome, such as sugarcane, the level of uncertainty within GBS data increases making trait mapping difficult. Furthermore, variant and genotype calling methods remain a challenge for both recent and ancient allopolyploids (e.g. wheat, maize, soybean, Miscanthus), particularly where the reference genome contains highly similar paralogous sequences that do not pair at meiosis. Alignment of sequence tags to the appropriate position within highly duplicated reference genomes remains a challenge inadequately addressed by existing alignment software. Although some variant calling pipelines can discriminate a paralogous locus from a Mendelian locus, the detection of these paralogous loci is typically for the purpose of the exclusion of these loci from the downstream analysis of genomic studies. We explore the significance of eliminating paralogous loci in downstream analysis using a newly developed pipeline developed to sort sequence tags to their correct alignment locations based on the novel Hind/HE statistic. The goal of this study was to evaluate the sorting pipeline’s ability to properly align paralogous loci to the correct position with respect to the reference genome. Three studies were conducted with a population of 400 individuals simulated based upon the Triticum aestivum, the reanalysis of a previously published genome-wide study of fusarium head blight in 273 wheat breeding lines, and the reanalysis of a previously published genome-wide study of traits associated with yield in a Miscanthus diversity panel. Results from the study suggested that the filtering of sequences using the Hind/HE statistic underlying polyRAD v1.2 may lead differences in the output of sequences. Further comparison of each output suggested that the output of the novel pipeline, polyRAD, was concentrated in gene-rich regions compared to other standard variant calling pipelines. From this study, we provide recommendations for future users of the polyRAD v1.2 variant calling pipeline. Overall we recommend that polyRAD v1.2 is more useful for populations of outcrossing species.
Issue Date:2020-10-05
Type:Thesis
URI:http://hdl.handle.net/2142/109337
Rights Information:Copyright 2020 Wittney Mays
Date Available in IDEALS:2021-03-05
Date Deposited:2020-12


This item appears in the following Collection(s)

Item Statistics