Files in this item
|(no description provided)|
|Title:||Task flow: A novel approach to fine-grain wafer-scale parallel computing|
|Author(s):||Horst, Robert Whiting|
|Doctoral Committee Chair(s):||Patel, Janak H.|
|Department / Program:||Computer Science|
|Degree Granting Institution:||University of Illinois at Urbana-Champaign|
|Subject(s):||Engineering, Electronics and Electrical
|Abstract:||This thesis introduces a parallel computer architecture known as task flow. Simple replicated cells contain both memory packets and processing logic. Each memory packet contains a data field, the next instruction, and a link. Transmission packets flow between memory packets to communicate instructions and temporary data values. Program execution is performed by multiple tasks that communicate through memory variables. Each task may remain local to one cell for extended periods or may flow between cells with as few as one instruction executed in each cell.
SWIFT is a register-transfer simulation that emulates program execution in a ring of task flow cells. SWIFT runs programs generated by a new assembly language, LIFT. Program fragments written in LIFT explore task flow programming of distributed reduction operations, sparse matrix arithmetic, and associative memory. The performance of coarse-grain application synchronization is modeled using test-and-set locking, full/empty bits, and a new locking method called rotating critical sections.
A new linear-array wafer scale integration architecture has been developed for task flow computing. The architecture has low area overhead, yet is effective in tolerating defective cells on the wafer. A simple configuration algorithm guarantees inclusion of all reachable functional cells. The fixed number of logic levels between cells, and phase shifted synchronous clocking, help to reduce cycle times.
The performance of a sparse matrix-vector multiplication program is evaluated with different matrix sizes and densities. A 100-cell SWIFT simulation of a 1,000 by 1,000 sparse matrix shows a speedup of 79 over the best serial algorithm. These results, when extrapolated to the simulation of artificial neural networks, predict a performance of 759 million connections-per-second with 1,000 neurons and 100,000 connections.
The thesis concludes with a discussion of ways in which the basic architecture may be extended through different interconnection networks, increased pipelining, improved memory organization, and other architectural changes. The thesis is put into perspective by comparing task flow to other parallel architectures and by comparing SWIFT to other parallel machines.
|Rights Information:||Copyright 1991 Horst, Robert Whiting|
|Date Available in IDEALS:||2011-05-07|
|Identifier in Online Catalog:||AAI9136618|