Files in this item



application/pdfGAO-DISSERTATION-2020.pdf (11MB)Restricted to U of Illinois
(no description provided)PDF


Title:A pixel-parallel architecture for graph cuts inference
Author(s):Gao, Tianqi
Director of Research:Rutenbar, Rob A.
Doctoral Committee Chair(s):Rutenbar, Rob A.
Doctoral Committee Member(s):Chen, Deming; Forsyth, Daviad; Wong, Martin; Blank, William Thomas
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Hardware accelerator, FPGA, Machine Learning, Computer Vision, Markov Random Fields, Graph Cuts, Computer Architecture
Abstract:In recent years, the advancement in machine learning techniques has greatly improved the perceived qualities of many real-life applications such as computer vision and machine listening. But, most machine learning techniques require massive computing power. In practice, deploying such techniques raises challenges for hardware design, since traditional computing systems are not suitable to fully support computationally intensive machine learning algorithms. This dissertation reports on the design and implementation of custom hardware for fast and efficient machine learning applications. In the field of machine learning, the process of making the most likely decision based on observations is referred to as inference, which often re- quires efficient algorithms. The method of graph cuts converts a maximum a posteriori (MAP) inference problem on Markov random fields (MRFs) into a network flow, which can be solved in a direct manner. Many computer vision problems can be conveniently cast as an inference task to find a most likely label for each pixel. The method is thus widely used, but computation- ally burdensome given the need to run iterative network flow computations on every pixel of an image. Prior accelerator attempts, on either GPU or FPGA, have failed to exploit the problem’s attractive, maximum available parallelism: Push-relabel style network flow solvers can run in parallel across every pixel of an image. In this dissertation, we describe the design and implementation of the first pixel-parallel graph cuts inference accelerator. The architecture is scalable, and relies on predominantly local neighbor-calculation among pixels. A checkerboard scheduling scheme takes advantage of maximum parallelism while avoiding critical data dependencies in the push-relabel network flow computation. We first demonstrate the advantage of our pixel-parallel architecture by implementing two pixel-processor arrays with 256 processors and 1024 processors on FPGAs. Our implementations can accelerate a segmentation task on images’ cropped areas, referred to as image tiles, with 8-by-32 pixels or 16-by-64 pixels. In experiments, our accelerators achieved about 300-400X speedups over a sequential implementation. Next, to overcome the physical restraints of hardware resource and memory size, and to accelerate graph cuts on larger images, we developed methods to divide images into virtual regions, which can then be efficiently processed on a physical processor array. We then design a complete memory system for the inference engine, from on-chip to off-chip memory. To achieve better performance, other design challenges are also addressed, such as the need of implementing a hardware-friendly relabel heuristic for faster convergence. Experiment results show that the proposed virtual − image system achieved 5-7X speedups compared with other FPGA accelerators, and about 2X speedup compared with a GPU open-source implementation, on benchmark images of size ranging from 320-by-480 to 640-by-480.
Issue Date:2020-05-07
Rights Information:Copyright 2020 Tianqi Gao
Date Available in IDEALS:2020-08-26
Date Deposited:2020-05

This item appears in the following Collection(s)

Item Statistics