Files in this item



application/pdf9512325.pdf (6MB)Restricted to U of Illinois
(no description provided)PDF


Title:Compiler optimizations for parallel loops with fine-grained synchronization
Author(s):Chen, Ding-Kai
Doctoral Committee Chair(s):Yew, Pen-Chung
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Computer Science
Abstract:Loops in scientific and engineering applications provide a rich source of parallelism. In order to obtain a higher level of parallelism, loops with loop-carried dependences, which are largely serialized using the traditional techniques, need to be parallelized with fine-grained synchronization. This approach, so-called DOACROSS parallelization, requires new optimization strategies in order to preserve parallelism while minimizing the amount of inter-processor communication. In this thesis, I examine closely issues involved in the DOACROSS parallelization. There are two focuses in this work: (1) increasing parallelism, and (2) reducing communication overhead. Strategies for four major optimization problems are proposed and described in detail in this thesis. Our statement re-ordering and redundant synchronization optimization can enhance the overlap between iterations with reduced synchronization for loops with uniform dependences (i.e., with constant dependence distance vectors). For loops with non-uniform dependences, a new dependence uniformization scheme is used to compose a small set of uniform dependences which has a small dependence cone. These uniform dependences can be used to preserve all non-uniform dependences, and a small dependence cone size implies a higher speedup and lower communication overhead. Our last loop transformation is for loops with unknown dependences at compile time. It is a runtime parallelization technique with good locality and high concurrency. It has fairly consistent performance because its runtime analysis, which is usually the performance bottleneck of most runtime schemes, requires less global communication. We provide performance measurement and comparison with the schemes previously proposed. The results indicate that these schemes out-perform earlier schemes in terms of higher parallelism and lower communication requirement. These schemes form an integral part of the future high performance parallelizing compilers.
Issue Date:1994
Rights Information:Copyright 1994 Chen, Ding-Kai
Date Available in IDEALS:2011-05-07
Identifier in Online Catalog:AAI9512325
OCLC Identifier:(UMI)AAI9512325

This item appears in the following Collection(s)

Item Statistics