Compiling Vector Programs for SIMD Devices

Ren, Gang

Compiling Vector Programs for SIMD Devices

Ren, Gang

Permalink

https://hdl.handle.net/2142/11226

Description

Title: Compiling Vector Programs for SIMD Devices
Author(s): Ren, Gang
Issue Date: 2006-07
Keyword(s): computer science
Abstract: As an effective way of utilizing data parallelism in applications, SIMD architecture has been adopted by most today's microprocessors. Using intrinsic functions and automatic compilation are common programming methods for today's SIMD devices. However, neither methods can provide enough programmability and performance at the same time. Many issues must be addressed to generate efficient SIMD code. For example, most SIMD devices only support memory accesses on contiguous and aligned sections. Additional permutation instructions are needed for non-contiguous and/or misaligned references. Such overhead can cancel all performance benefits from SIMD computation. VINCI, or Vector I-code Novel Compilation Infrastructure, is proposed in this thesis. VINCI focuses on translating vector programs into efficient code for SIMD devices. Vectors in input programs can have arbitrary length, strides, and alignment settings. However, vectors required by SIMD devices must have the same fixed length, unit strides, and aligned addresses. VINCI employs a sequence of program transformations to convert all vectors into such specific format. VINCI also includes several optimization algorithms. The optimization algorithm on data permutations is of great importance. By unifying all forms of data permutations into the explicit representation, the optimization algorithm can reduce the number of data permutations in vector programs by propagating them across statements and merging them whenever possible. In addition, an efficient code generation algorithm is included to generate native permutation instructions from vector permutation operations. Besides, any common compiler analysis and optimizations were also extended for vector representation and included in VINCI. Two examples are def-use analysis and copy propagation. In addition, two domain-specific optimization techniques for DSP programs are also extended for vector programs. These optimizations are necessary to delivery the final performance on SIMD devices. VINCI was implemented on the HiLO compiler, an internal compiler used in SPIRAL. Experiments were conducted on two platforms, VMX and SSE2. Testing applications include both automatically-generated programs and manually-written kernels. The results show that up to 77% of the permutation instructions are eliminated and, as a result, the average performance improvement is 48% on VMX and 68% on SSE2. For several applications, near perfect speedups have been achieved on both platforms.
Type of Resource: text
Permalink: http://hdl.handle.net/2142/11226
Copyright and License Information: You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).

Compiling Vector Programs for SIMD Devices

Ren, Gang

Permalink

Description

Owning Collections

Research and Tech Reports - Computer Science PRIMARY

Compiling Vector Programs for SIMD Devices

Ren, Gang

Permalink

Description

Owning Collections

Research and Tech Reports - Computer Science PRIMARY

Log In