Files in this item



application/pdfSTEPHENS-THESIS-2015.pdf (2MB)Restricted Access
(no description provided)PDF


Title:Empirical accuracy bounds for next-generation sequencing variant calling workflows
Author(s):Stephens, Zachary Daniel
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Next-Generation Sequencing (NGS) Accuracy Benchmarking
Next-Generation Error Analysis Toolkit (NEAT)
Next-Generation Sequencing (NGS) Accuracy Bounds
Abstract:This thesis investigates the accuracy bounds imposed on alignment-based variant calling workflows due to inherent uncertainties introduced by sequencing platforms. In this work we will use simulated data to empirically quantify the maximum performance that can be expected for alignment and variant detection accuracy in a workflow. Short read sequencers are inherently incapable of producing reads that can be uniquely mapped to every position of the human reference genome, so errors are inevitable. We will analyze the repetitive content of several organisms, and estimate the maximum attainable alignment accuracy as a function of read length. Additionally, we will show that paired-end sequencing with large insert sizes (also referred to as "mate-pair" sequencing) is capable of mapping >99% of the human genome. We have developed a set of tools, NEAT (Next-generation Error Analysis Toolkit), which we use to create fault-injected genomic datasets. Our experiments utilize these datasets to showcase how the behavior of BWA and GATK workflows changes as a function of read lengths, error rates, quality scores, error types, and mutation types. We utilize these results to quantify the performance gains that can be expected by altering these properties of an NGS dataset. Our results highlight the sensitivity of alignment software to read lengths and error rates, and the sensitivity of variant callers to quality scores and structural variation.
Issue Date:2015-05-01
Rights Information:Copyright 2015 Zachary Stephens
Date Available in IDEALS:2015-07-22
Date Deposited:May 2015

This item appears in the following Collection(s)

Item Statistics