Files in this item



application/pdfAutomatic Tunin ... e on Graphics Hardware.pdf (231kB)
(no description provided)PDF


Title:Automatic Tuning of Matrix Multiplication Performance on Graphics Hardware
Author(s):Jiang, Changhao; Snir, Marc
Subject(s):computer graphics
Abstract:Graphics hardware's performance is advancing much faster than the performance of conventional microprocessor. In order to utilize the tremendous computing power of these systems, it is critical to tune software to graphics hardware's architectural features. The frequent changes in GPUs' architecture and performance characteristics make it very desirable for such tuning to be automated. This paper implements an automatic tuning system to generate high-performance matrix-multiplication implementation on graphics hardware. The automatic tuning system uses a parameterized code generator to generate multiple versions of matrix multiplication, whose performances are empirically evaluated by actual execution on the target platform. An ad-hoc search engine is employed to search over the implementation space for the version that yields the best performance. In contrast to similar systems on CPUs, which utilize cache blocking, register tiling, instruction scheduling etc. tuning strategies, this paper identifies and exploits several tuning strategies that are unique for graphics hardware. These tuning strategies include optimizing for multiple-render-targets, SIMD instructions with data packing, overcoming limitations on instruction count and dynamic branch instruction. The generated implementations have comparable performance with expert manually tuned version in spite of the significant over-head incurred due to the use of the high-level BrookGPU language. As the first attempt in automatic generation of numerical libraries for graphics hardware, the results from this paper are encouraging.
Issue Date:2005-04
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2005-2558
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-17

This item appears in the following Collection(s)

Item Statistics