Files in this item
|(no description provided)|
|Title:||Performance evaluation of vector machine architectures|
|Doctoral Committee Chair(s):||Davidson, Edward S.|
|Department / Program:||Electrical and Computer Engineering|
|Degree Granting Institution:||University of Illinois at Urbana-Champaign|
|Subject(s):||Engineering, Electronics and Electrical|
|Abstract:||Vector machines are well known for their high-peak performance, but the delivered performance varies greatly over different workloads and depends strongly on compiler optimizations. Recently it has been claimed that several horizontal superscalar architectures, e.g., VLIW and polycyclic architectures, provide a more balanced performance across a wider range of scientific workloads than do vector machines. The purpose of this research is to study the performance of register-register vector processors, such as Cray supercomputers, as a function of their architectural features, scheduling schemes, compiler optimization capabilities, and program parameters. The results of this study also provide a base for comparing vector machines with horizontal superscalar machines.
An evaluation methodology, based on timing parameters, bottlenecks, and run time bounds, is developed. Cray-1 performance is degraded by the multiple memory loads of index-misaligned vectors and the inability of the Cray Fortran Compiler (CFT) to produce code that hits all the chain slot times. The Cray X-MP processor has three memory ports and supports flexible chaining, but its vector register reservation scheme poses a problem for the current CFT compilers, thereby reducing execution concurrency. The causes of the performance differences of two Cray Fortran compilers, CFT1.14 and CFT77(1.3), on the vectorized Livermore Fortran Kernels (LFKs) are discovered and some areas for further improvement are suggested.
The impact of chaining and two instruction scheduling schemes on one-memory-port vector supercomputers, illustrated by the Cray-1 and Cray-2, is studied. The lack of instruction chaining on the Cray-2 requires a different instruction scheduling scheme from that of the Cray-1. Situations are characterized in which simple vector scheduling can generate code that fully utilizes one functional unit for machines with chaining. Even without chaining, polycyclic scheduling guarantees full utilization of one functional unit, after an initial transient, for loops with acyclic dependence graphs.
The effectiveness of applying polycyclic vector scheduling (PVS) to the Cray-2 is compared with optimal simple vector scheduling on the Cray-1. More than 30% performance improvement on several vectorized LFKs is achieved by PVS over the current CFT77(2.0) compiler on the Cray-2. Some hardware modifications that could improve the effectiveness of applying PVS are evaluated.
|Rights Information:||Copyright 1989 Tang, Ju-ho|
|Date Available in IDEALS:||2011-05-07|
|Identifier in Online Catalog:||AAI8924954|
This item appears in the following Collection(s)
Graduate Dissertations and Theses at Illinois
Graduate Theses and Dissertations at Illinois
Dissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer Engineering