Files in this item



application/pdfCompiling Vector Programs for SIMD Devices.pdf (645kB)
(no description provided)PDF


Title:Compiling Vector Programs for SIMD Devices
Author(s):Ren, Gang
Subject(s):computer science
Abstract:As an effective way of utilizing data parallelism in applications, SIMD architecture has been adopted by most today's microprocessors. Using intrinsic functions and automatic compilation are common programming methods for today's SIMD devices. However, neither methods can provide enough programmability and performance at the same time. Many issues must be addressed to generate efficient SIMD code. For example, most SIMD devices only support memory accesses on contiguous and aligned sections. Additional permutation instructions are needed for non-contiguous and/or misaligned references. Such overhead can cancel all performance benefits from SIMD computation. VINCI, or Vector I-code Novel Compilation Infrastructure, is proposed in this thesis. VINCI focuses on translating vector programs into efficient code for SIMD devices. Vectors in input programs can have arbitrary length, strides, and alignment settings. However, vectors required by SIMD devices must have the same fixed length, unit strides, and aligned addresses. VINCI employs a sequence of program transformations to convert all vectors into such specific format. VINCI also includes several optimization algorithms. The optimization algorithm on data permutations is of great importance. By unifying all forms of data permutations into the explicit representation, the optimization algorithm can reduce the number of data permutations in vector programs by propagating them across statements and merging them whenever possible. In addition, an efficient code generation algorithm is included to generate native permutation instructions from vector permutation operations. Besides, any common compiler analysis and optimizations were also extended for vector representation and included in VINCI. Two examples are def-use analysis and copy propagation. In addition, two domain-specific optimization techniques for DSP programs are also extended for vector programs. These optimizations are necessary to delivery the final performance on SIMD devices. VINCI was implemented on the HiLO compiler, an internal compiler used in SPIRAL. Experiments were conducted on two platforms, VMX and SSE2. Testing applications include both automatically-generated programs and manually-written kernels. The results show that up to 77% of the permutation instructions are eliminated and, as a result, the average performance improvement is 48% on VMX and 68% on SSE2. For several applications, near perfect speedups have been achieved on both platforms.
Issue Date:2006-07
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2006-2737
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-21

This item appears in the following Collection(s)

Item Statistics