Files in this item



application/pdfECE499-Sp2019-guan.pdf (538kB)Restricted to U of Illinois
(no description provided)PDF


Title:New compression scheme for integer annotation in VCF files
Author(s):Guan, Haozhong
Contributor(s):Ochoa, Idoia
Subject(s):Compression scheme for VCF file
Genomic data storage
Abstract:This thesis introduces specialized compression schemes for integer type annotations in genomic VCF files. Variant call format (VCF) is a text file format. The genomic VCF files contain the genotype information of a collection of samples, i.e., the variants/differences of a given genome with respect to a reference sequence, together with several important variant annotations. These annotations such as read depth (DP) and allele frequency (AF) are stored in different data types, which are always used as input to several analysis pipelines, especially in the clinical setting. Therefore, easy access to the data is crucial for clinics to facilitate their analysis and meet possible time and memory constraints. In consideration of such requirements, the goal of the project is to design compression schemes supporting fast queries for VCF files. The main focus of this thesis is introducing a new compression scheme for RO, QA, and QR annotations in VCF files.
Issue Date:2019-05
Date Available in IDEALS:2019-06-13

This item appears in the following Collection(s)

Item Statistics