Files in this item



application/pdfKO-DISSERTATION-2017.pdf (10MB)
(no description provided)PDF


Title:Sampling architectures for probabilistic inference
Author(s):Ko, Glenn Gihyun
Director of Research:Rutenbar, Rob A.
Doctoral Committee Chair(s):Rutenbar, Rob A.
Doctoral Committee Member(s):Chen, Deming; Shanbhag, Naresh R.; Smaragdis, Paris; Nurvitadhi, Eriko
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Machine learning
Probabilistic graphical model
Probabilistic inference
Markov chain Monte Carlo
Gibbs sampling
Abstract:In recent years, machine learning (ML) algorithms for applications such as computer vision, machine listening, topic modeling (i.e., extraction) from large text data sets, etc., have proven to be effective in terms of perceived quality. However, these ML applications tend to be compute-intensive and create performance challenges. We focus on hardware accelerator architectures for inference on probabilistic graphical models, in particular for Markov random field (MRF) and for latent Dirichlet allocation (LDA). Our work focuses on inference via sampling methods, in particular, Markov chain Monte Carlo (MCMC) methods. Roughly speaking, we generate samples from the distribution of labels implied by the structure of the graphical model, and use results computed from the samples to approximate the results we seek. MCMC methods are extraordinarily popular in inference tasks and are widely used, especially with very large models, but they are not commonly seen as either “fast” or “low power” - which are the challenges we seek to address. However, performance is not our only concern. We focus on two applications to drive this research. First, we explore sound source separation, which can be used to separate a human voice from back- ground noise on mobile phones, e.g. talking on your cell phone in an airport. The challenges involved are real-time execution and power constraints. As a solution, we present a novel hardware-based sound source separation implementation capable of real-time streaming performance. The implementation uses a Markov random field inference formulation of foreground/background separation, and targets voice separation on mobile phones with two micro- phones. We demonstrate a real-time streaming FPGA implementation running at 150 MHz with total of 207 KB RAM. Our implementation achieves a speedup of 22X over a conventional software implementation, achieves an SDR of 7.021 dB with 1.6 ms latency, and exhibits excellent perceived audio quality. A virtual ASIC design shows that this architecture is quite small (less than 10 million gates), and consumes only 70 mW and appears amenable to additional optimization for power. The second application is an enterprise-scale application, topic modeling, which is used to extract hidden thematic structure of large sets of documents. Enterprise-scale clusters are usually required to run such massive tasks. We would like to explore the potential benefits of accelerating topic models and provide speed/power trade-offs of building hardware accelerators for them. We took a latent Dirichlet allocation, a probabilistic topic model, and Gibbs sampling inference implementation and profiled it to show that 96% of the run-time is spent sampling, which would be the main focus of the acceleration. We describe a parallel architecture on a FPGA that is theoretically only bounded by memory bandwidth running at 220 MHz and where even a single core is faster than workstation-grade CPU cores. Lastly, we share our findings on accelerating parallel versions of the Gibbs sampling algorithm and also look at precision requirements and potential for huge reduction in number of bits used to perform Gibbs sampling inference on applications such as source separation. We implement a multi-threaded C++ and CUDA GPU implementation of chromatic Gibbs sampling which is a parallel version of Gibbs sampling that uses a graph-coloring scheme to construct Markov chains that can be executed in parallel. We show 1.9X and 22.9X speedups respectively, compared to a conventional single core version running on Intel Xeon. Furthermore, our analysis of the precision and dynamic range of the source separation application showed that we only required 8-bit reduced floating-point to maintain a very low decision error rate on the Gibbs sampler. These early results suggest reduced precision asynchronous Gibbs sampling architectures.
Issue Date:2017-11-17
Rights Information:Copyright 2017 Glenn Gihyun Ko
Date Available in IDEALS:2018-03-13
Date Deposited:2017-12

This item appears in the following Collection(s)

Item Statistics