Files in this item



application/pdfIzzat_El Hajj.pdf (1MB)
(no description provided)PDF


Title:Dynamic loop vectorization for executing OpenCL kernels on CPUs
Author(s):El Hajj, Izzat
Advisor(s):Hwu, Wen-Mei W.
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Open computing language (OpenCL)
Central processing units (CPUs)
Control Divergence
Performance Portability
Abstract:Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many systems now integrate CPUs and GPUs cooperating together on a single node. Much effort is invested in tuning GPU-kernels. However, it can be the case that some systems may not have GPUs or the GPUs are busy. Maintaining two versions of the same code for GPUs and CPUs is expensive. For this reason, it would be ideal if one could retarget GPU-optimized kernels to run efficiently on a CPU. Many efforts have been made to compile OpenCL kernels to run efficiently on CPUs. Such approaches typically involve running work-groups in parallel on different CPU threads, and executing work-items within a work-group in one thread serially via loop-based serialization or in parallel via SIMD vectorization. SIMD vectorization is particularly difficult where control divergence is present. This thesis proposes a technique for transforming divergent loops in OpenCL kernels such that vectorization opportunities can be extracted when possible and memory access patterns can be improved. The transformations presented show promising speedups for kernels that follow GPU programming best practices, and slowdowns for kernels that do not.
Issue Date:2014-05-30
Rights Information:Copyright 2014 Izzat El Hajj
Date Available in IDEALS:2014-05-30
Date Deposited:2014-05

This item appears in the following Collection(s)

Item Statistics