Files in this item



application/pdf9512515.pdf (7MB)Restricted to U of Illinois
(no description provided)PDF


Title:Memory latency reduction via data prefetching and data forwarding in shared memory multiprocessors
Author(s):Poulsen, David Kristian
Doctoral Committee Chair(s):Yew, Pen-Chung
Department / Program:Electrical and Computer Engineering
Discipline:Electrical Engineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Engineering, Electronics and Electrical
Computer Science
Abstract:This dissertation considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. The benefits of prefetching and forwarding are considered for large, numerical application codes with loop-level and vector parallelism. Data prefetching is applied to these applications using two different multiprocessor prefetching algorithms implemented within a parallelizing compiler. Data forwarding considers array references involved in communication-related accesses between successive parallel loops, rather than within a single loop nest. A hybrid prefetching and forwarding scheme and a compiler algorithm for data forwarding are also presented.
EPG-sim, a system of execution-driven simulation tools for studying parallel architectures, algorithms, and applications, was developed as a prerequisite for this work. EPG-sim performs execution-driven simulation and critical path simulation within a single, integrated environment. EPG-sim provides an extremely wide range of cost/accuracy trade-offs and a number of novel features compared to existing execution-driven systems. The parallelism and communication behavior of numerical application codes are studied via EPG-sim critical path simulation, which establishes the potential performance of prefetching and forwarding for these codes. The evaluation of prefetching and forwarding is accomplished via detailed EPG-sim execution-driven simulations of optimized, parallel versions of these application codes.
Two multiprocessor prefetching algorithms are presented and compared. A simple blocked vector prefetching algorithm, considerably less complex than existing software pipelined prefetching algorithms, is shown to be effective in reducing memory latency and increasing performance. A Forwarding Write operation is used to evaluate the effectiveness of forwarding. Data forwarding results in significant performance improvements over data prefetching for codes exhibiting less spatial locality. A new hybrid prefetching and forwarding scheme is presented that provides increased performance stability by adapting to varying application characteristics and architectural parameters. The hybrid scheme is shown to be effective in improving the performance of forwarding in reduced cache sizes. A compiler algorithm for data forwarding is presented that implements point-to-point forwarding, hybrid prefetching and forwarding, and selective forwarding. Software and hardware support for prefetching and forwarding are also discussed.
Issue Date:1994
Rights Information:Copyright 1994 Poulsen, David Kristian
Date Available in IDEALS:2011-05-07
Identifier in Online Catalog:AAI9512515
OCLC Identifier:(UMI)AAI9512515

This item appears in the following Collection(s)

Item Statistics