Files in this item
|(no description provided)|
|Title:||Communication and Synchronization in Parallel Computation|
|Author(s):||Jayasimha, Doddaballapur Narasimha-Murthy|
|Doctoral Committee Chair(s):||Loui, Michael C.; Lawrie, Duncan H.,|
|Department / Program:||Computer Science|
|Degree Granting Institution:||University of Illinois at Urbana-Champaign|
|Abstract:||In this thesis we study communication and synchronization overheads in parallel algorithms executing on shared memory multiprocessors.
In the first part, we focus on the communication requirements of parallel algorithms. We define a communication complexity measure for parallel algorithms, modeling the execution of a parallel algorithm on a shared memory multiprocessor by a directed acyclic graph (DAG). We derive a lower bound on the communication for a particular class of graphs corresponding to nested iterative loops. For the DAG representing a triangular solver of size n, we show that the communication (c) and parallel time (t) product ct = $\Omega$($n\sp3$) irrespective of the number of processors used. This result implies there exists a trade-off between computation time and communication. Finally, we define a synchronization complexity measure.
In the second part, we consider the commonly required forms of synchronization in a multiprocessor: barrier, reporting, and mutual exclusion. We study three schemes to implement barrier synchronization and develop analytical models to estimate their performance. For n processes, the first scheme requires $O(n$ log $n$) accesses as opposed to ($2n$ - 1), the minimum number of accesses. The second scheme, which uses a synchronization tree, requires $O(log\ n)$ accesses to any node of the tree. The third scheme has a linear growth in the number of accesses (approximately 3n). We study two of the synchronization schemes through simulation. The simulator is trace driven and is based on the Omega interconnection network. The simulation results are in broad agreement with our analytical results.
We next introduce a new synchronization primitive, the distributed synchronizer. An efficient implementation of this primitive requires simple hardware enhancements at the switching elements of the processor-memory multistage interconnection network. The primitive implements reporting and the barrier with negligible overheads, and the semaphore operations are bounded fair.
Finally, we study bounded fairness. Bounded fairness is stronger than the usual notion of fairness based on eventuality. We give applications of the bounded fairness concept. We formalize bounded fairness by introducing a new operator $\langle k\rangle$ into temporal logic. We show that the new logic is more powerful than the temporal logic with the eventuality operator, and is complete and satisfiable.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1988.
|Date Available in IDEALS:||2014-12-15|