Files in this item



application/pdf9210894.pdf (6MB)Restricted to U of Illinois
(no description provided)PDF


Title:Processor parallelism considerations and memory latency reduction in shared memory multiprocessors
Author(s):Lilja, David John
Doctoral Committee Chair(s):Yew, Pen-Chung
Department / Program:Electrical and Computer Engineering
Discipline:Electrical and Computer Engineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Engineering, Electronics and Electrical
Computer Science
Abstract:A wide variety of computer architectures have been proposed to exploit parallelism at different granularities. These architectures have significant differences in instruction scheduling constraints, memory latencies, and synchronization overhead, making it difficult to determine which architecture can achieve the best performance on a given program. Trace-driven simulations and analytic models are used to compare the instruction-level parallelism of a superscalar processor and a pipelined processor with the loop-level parallelism of a shared memory multiprocessor. It is shown that the maximum speedup for a loop with a cyclic dependence graph is limited by its critical dependence ratio, independent of the number of iterations in the loop. The fine-grained processors are better suited for executing these loops with cyclic dependence graphs, while the multiprocessor has better performance on the very parallel loops with acyclic dependence graphs. When executing programs with a variety of loops and sequential code, the best performance is obtained using a multiprocessor architecture in which each individual processor has a fine-grained parallelism of two to four.
A major problem with this type of shared memory multiprocessor architecture is the long latency in fetching operands from the shared memory. Private data caches are an effective means of reducing this latency, but they introduce the complexity of a cache coherence mechanism. Both hardware and software schemes have been proposed for maintaining coherence in these systems. Unfortunately, hardware schemes have very high memory requirements, and software schemes rely on imprecise compile-time memory disambiguation. A new compiler-assisted directory coherence mechanism is proposed that combines the best aspects of the hardware and software approaches while eliminating many of their disadvantages. The pointer cache directory significantly reduces the size of a hardware directory by dynamically binding pointers to cache blocks only when the blocks are actually referenced. Compiler optimizations can further reduce the size of the directory by signaling the hardware to allocate pointers only when they are needed. Detailed trace-driven simulations show that the performance of this new approach is comparable to other coherence schemes, but with significantly lower memory requirements.
Issue Date:1991
Rights Information:Copyright 1991 Lilja, David John
Date Available in IDEALS:2011-05-07
Identifier in Online Catalog:AAI9210894
OCLC Identifier:(UMI)AAI9210894

This item appears in the following Collection(s)

Item Statistics