Files in this item



application/pdfsack_paul.pdf (973kB)
(no description provided)PDF


Title:Scalable collective message-passing algorithms
Author(s):Sack, Paul
Director of Research:Gropp, William D.
Doctoral Committee Member(s):Kale, Laxmikant V.; Padua, David A.; Snir, Marc
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
collective algorithms
Abstract:Governments, universities, and companies expend vast resources building the top supercomputers. The processors and interconnect networks become faster, while the number of nodes grows exponentially. Problems of scale emerge, not least of which is collective performance. This thesis identifies and proposes solutions for two major scalability problems. Our first contribution is a novel algorithm for process-partitioning and remapping for exascale systems that has far better time and space scaling than known algorithms. Our evaluations predict an improvement of up to 60x for large exascale systems and arbitrary reduction in the large temporary buffer space required for generating new communicators. Our second contribution consists of several novel collective algorithms for Clos and torus networks. Known allgather, reduce-scatter, and composite algorithms for Clos networks suffer the worst congestion when the largest messages are exchanged, damaging performance. Known algorithms for torus networks use only one network port, regardless of how many are available. Unlike known algorithms, our algorithms have a small amount of redundant communication. Unlike known algorithms, our algorithms can be reordered so that congestion hinders small messages rather than large, and all ports can be fully used on multi-port torus networks. The redundant communication gives us this flexibility. On a 32k-node system, we deliver improvements of up to 11x for the reduce-scatter operation, when the native reduce-satter algorithm does not use special hardware, and 5.5x for the allgather operation. We show large improvements over native algorithms with as few as 16 processors.
Issue Date:2012-02-06
Genre:Dissertation / Thesis
Rights Information:Copyright 2011 Paul Sack
Date Available in IDEALS:2012-02-06
Date Deposited:2011-12

This item appears in the following Collection(s)

Item Statistics