Automatic translation of CUDA to OpenCL and comparison of performance optimizations on GPUS
Nandakumar, Deepthi
Loading…
Permalink
https://hdl.handle.net/2142/24279
Description
Title
Automatic translation of CUDA to OpenCL and comparison of performance optimizations on GPUS
Author(s)
Nandakumar, Deepthi
Issue Date
2011-05-25T15:04:13Z
Director of Research (if dissertation) or Advisor (if thesis)
Hwu, Wen-Mei W.
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Date of Ingest
2011-05-25T15:04:13Z
Keyword(s)
Performance optimizations on raphics processing units (GPUs)
OpenCL
Performance Portability
Abstract
As an open, royalty-free framework for writing programs that execute across heterogeneous platforms, OpenCL gives programmers access to a variety of data parallel processors including CPUs, GPUs, the Cell and DSPs. All OpenCL-compliant implementations support a core specification, thus ensuring robust functional portability of any OpenCL program. This thesis
presents the CUDAtoOpenCL source-to-source tool that translates code from CUDA to OpenCL, thus ensuring portability of applications on a variety of devices. However, current compiler optimizations are not sufficient to translate performance from a single expression of the program onto a wide variety of different architectures. To achieve true performance portability, an open standard like OpenCL needs to be augmented with automatic high-level
optimization and transformation tools, which can generate optimized code and configurations for any target device.
This thesis presents details of the working and implementation of the
CUDAtoOpenCL translator, based on the Cetus compiler framework. This
thesis also describes key insights from our studies optimizing selected benchmarks for two distinct GPU architectures: the NVIDIA GTX280 and the ATI Radeon HD 5870. It can be concluded from the generated results that the type and degree of optimization applied to each benchmark need to be adapted to the target architecture specifications. In particular,
the different hardware architectures of the basic compute unit, register file organization, on-chip memory limitations, DRAM coalescing patterns and floating point unit throughput of the two devices interact with each
optimization differently.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.