Files in this item

FilesDescriptionFormat

application/pdf

application/pdfLI-THESIS-2021.pdf (1MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:Improved GPU implementations of the Pair-HMM forward algorithm for DNA sequence alignment
Author(s):Li, Enliang
Advisor(s):Chen, Deming
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):GPU
Hardware Acceleration
Pair-HMM
CUDA implementation
Computational Genomics
Abstract:With the rise of Next-Generation Sequencing (NGS), clinical sequencing services have become more accessible but also facing new challenges. As we discovered the closed connection between key DeoxyriboNucleic Acid (DNA) mutation spots and major diseases or conditions, the need for computational genomics has increased significantly. The surging demand motivates developments of more efficient algorithms for genome assembly, error correction, k-mer counting etc. In this thesis, we focus on DNA sequencing analysis, one of the fastest-growing markets in NGS, and its related alignment problems. In recent years, many new hardware technologies and algorithms have been researched for their potential applications in massive parallel sequencing. The emerging hardware includes GPU, FPGA and other ASICs providing parallel processing resources. In this thesis, we choose GPU as our computation platform for its massive parallel processing capabilities. The Forward Algorithm (FA) still remains one of the most commonly used methods in solving sequences alignment problems modeled as Pair-Hidden Markov Model (HMM). The Pair-HMM Forward Algorithm (FA) is not only a computation but data intensive algorithm. Multiple previous works have been done in efforts to accelerate the computation of the FA by applying massive parallelization on the workload, and in this thesis, we bring more optimizations not only by improving the computation concurrency of both initialization process and Pair-HMM FA but also by tackling the communications overhead between the host and devices. We will discuss the general principles of optimizing the Forward Algorithm on GPU and present an improved implementation of the Pair-HMM FA with native CUDA C++. Our design has shown a speedup of 25.10x over the C++ baseline on the GATK HaplotypeCaller Pair-HMM workload with a portion of the real dataset from human genome database, NA12878. This is a major improvement that beats the state-of-the-art implementation with a margin of 60%.
Issue Date:2021-04-30
Type:Thesis
URI:http://hdl.handle.net/2142/110760
Rights Information:Copyright 2021 Enliang Li
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05


This item appears in the following Collection(s)

Item Statistics