Files in this item
Files | Description | Format |
---|---|---|
application/pdf ![]() ![]() | (no description provided) |
Description
Title: | Improved GPU implementations of the Pair-HMM forward algorithm for DNA sequence alignment |
Author(s): | Li, Enliang |
Advisor(s): | Chen, Deming |
Department / Program: | Electrical & Computer Eng |
Discipline: | Electrical & Computer Engr |
Degree Granting Institution: | University of Illinois at Urbana-Champaign |
Degree: | M.S. |
Genre: | Thesis |
Subject(s): | GPU
Hardware Acceleration Pair-HMM CUDA implementation Computational Genomics |
Abstract: | With the rise of Next-Generation Sequencing (NGS), clinical sequencing services have become more accessible but also facing new challenges. As we discovered the closed connection between key DeoxyriboNucleic Acid (DNA) mutation spots and major diseases or conditions, the need for computational genomics has increased significantly. The surging demand motivates developments of more efficient algorithms for genome assembly, error correction, k-mer counting etc. In this thesis, we focus on DNA sequencing analysis, one of the fastest-growing markets in NGS, and its related alignment problems. In recent years, many new hardware technologies and algorithms have been researched for their potential applications in massive parallel sequencing. The emerging hardware includes GPU, FPGA and other ASICs providing parallel processing resources. In this thesis, we choose GPU as our computation platform for its massive parallel processing capabilities. The Forward Algorithm (FA) still remains one of the most commonly used methods in solving sequences alignment problems modeled as Pair-Hidden Markov Model (HMM). The Pair-HMM Forward Algorithm (FA) is not only a computation but data intensive algorithm. Multiple previous works have been done in efforts to accelerate the computation of the FA by applying massive parallelization on the workload, and in this thesis, we bring more optimizations not only by improving the computation concurrency of both initialization process and Pair-HMM FA but also by tackling the communications overhead between the host and devices. We will discuss the general principles of optimizing the Forward Algorithm on GPU and present an improved implementation of the Pair-HMM FA with native CUDA C++. Our design has shown a speedup of 25.10x over the C++ baseline on the GATK HaplotypeCaller Pair-HMM workload with a portion of the real dataset from human genome database, NA12878. This is a major improvement that beats the state-of-the-art implementation with a margin of 60%. |
Issue Date: | 2021-04-30 |
Type: | Thesis |
URI: | http://hdl.handle.net/2142/110760 |
Rights Information: | Copyright 2021 Enliang Li |
Date Available in IDEALS: | 2021-09-17 |
Date Deposited: | 2021-05 |
This item appears in the following Collection(s)
-
Dissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer Engineering -
Graduate Dissertations and Theses at Illinois
Graduate Theses and Dissertations at Illinois