Files in this item



application/pdfAlexandros_Papakonstantinou.pdf (1MB)
(no description provided)PDF


Title:High-level automation of custom hardware design for high-performance computing
Author(s):Papakonstantinou, Alexandros
Director of Research:Chen, Deming
Doctoral Committee Chair(s):Chen, Deming
Doctoral Committee Member(s):Cong, Jason; Hwu, Wen-Mei W.; Wong, Martin D.F.
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):High-level synthesis
Field-Programmable Gate Array (FPGA)
parallel programming
High Performance Computing (HPC)
Abstract:This dissertation focuses on efficient generation of custom processors from high-level language descriptions. Our work exploits compiler-based optimizations and transformations in tandem with high-level synthesis (HLS) to build high-performance custom processors. The goal is to offer a common multiplatform high-abstraction programming interface for heterogeneous compute systems where the benefits of custom reconfigurable (or fixed) processors can be exploited by the application developers. The research presented in this dissertation supports the following thesis: In an increasingly heterogeneous compute environment it is important to leverage the compute capabilities of each heterogeneous processor efficiently. In the case of FPGA and ASIC accelerators this can be achieved through HLS-based flows that (i) extract parallelism at coarser than basic block granularities, (ii) leverage common high-level parallel programming languages, and (iii) employ high-level source-to-source transformations to generate high-throughput custom processors. First, we propose a novel HLS flow that extracts instruction level parallelism beyond the boundary of basic blocks from C code. Subsequently, we describe FCUDA, an HLS-based framework for mapping fine-grained and coarse-grained parallelism from parallel CUDA kernels onto spatial parallelism. FCUDA provides a common programming model for acceleration on heterogeneous devices (i.e. GPUs and FPGAs). Moreover, the FCUDA framework balances multilevel granularity parallelism synthesis using efficient techniques that leverage fast and accurate estimation models (i.e. do not rely on lengthy physical implementation tools). Finally, we describe an advanced source-to-source transformation framework for throughput-driven parallelism synthesis (TDPS), which appropriately restructures CUDA kernel code to maximize throughput on FPGA devices. We have integrated the TDPS framework into the FCUDA flow to enable automatic performance porting of CUDA kernels designed for the GPU architecture onto the FPGA architecture.
Issue Date:2013-02-03
Rights Information:Copyright 2012 Alexandros Papakonstantinou
Date Available in IDEALS:2013-02-03
Date Deposited:2012-12

This item appears in the following Collection(s)

Item Statistics