Files in this item

FilesDescriptionFormat

application/pdf

application/pdfSwapnil_Ghike.pdf (873kB)
(no description provided)PDF

Description

Title:Effectiveness of program transformations and compilers for directive-based GPU programming models
Author(s):Ghike, Swapnil
Advisor(s):Padua, David A.; Garzaran, Maria J.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Graphics Computing Units (GPUs)
Directive-based compilers
OpenACC
Rodinia
PGI
Cray
Performance evaluation
Program transformations
Abstract:Accelerator devices like the General Purpose Graphics Computing Units (GPGPUs) play an important role in enhancing the performance of many contemporary scientific applications. However, programming GPUs using languages like C for CUDA or OpenCL requires relatively high investment of time and the resulting programs are often fine-tuned to perform well only on a particular device. The alternative is to program in a conventional and machine independent notation and use compilers to transform CPU programs to heterogeneous form either automatically or relying on directives from the programmer. These compilers can offer the benefits of code portability and increased programmer productivity without imposing much penalty on performance. This thesis evaluates the quality of early versions of two compilers - the PGI compiler and the Cray compiler, as tools for translation of C programs written for single or multicore CPUs to heterogeneous programs that execute on NVIDIA's GPUs. In our methodology, we apply a sequence of transformations to CPU programs that allow the compilers to form GPU kernels from loops, and then we analze the impact of each transformation on the performance of compiled programs. Our further evaluation of the performance of 15 application kernels shows that the executables produced by the PGI and Cray compilers can achieve reasonable, and in some cases equivalent performance as compared to hand-written OpenMP and CUDA codes. Our results also show that the Cray compiler managed to produce faster executables for more applications than the PGI compiler. We show that for a heterogeneous program to execute faster, the traditional analyses and optimizations needed for producing a good sequential program are equally if not more valuable compared to those needed to produce a good GPU kernel. At the end of this thesis, we also provide a set of guidelines to programmers for extracting good performance from the heterogeneous executables produced by the PGI and Cray compilers.
Issue Date:2013-02-03
URI:http://hdl.handle.net/2142/42306
Rights Information:Copyright 2012 Swapnil Ghike
Date Available in IDEALS:2013-02-03
Date Deposited:2012-12


This item appears in the following Collection(s)

Item Statistics