Files in this item



application/pdfECE499-Sp2017-hutter-Edward.pdf (577kB)
(no description provided)PDF


Title:QR factorization over tunable processor grids
Author(s):Hutter, Edward
Contributor(s):Solomonik, Edgar
Subject(s):communication cost
parallel numerical algorithms
least squares problems
QR factorization
Abstract:The increasing complexity of modern computer architectures has greatly influenced algorithm design. Algorithm performance on these architectures is now determined by the movement of data. Therefore, modern algorithms should prioritize minimizing communication. In this work, we present a new parallel QR factorization algorithm solved over a tunable processor grid in a distributed memory environment. The processor grid can be tuned between one and three dimensions, resulting in tradeoffs in the asymptotic costs of synchronization, horizontal bandwidth, flop count, and memory footprint. This parallel algorithm is the first to efficiently extend the Cholesky-QR2 algorithm to matrices with an arbitrary number of rows and columns. Along its critical path of execution on P processors, our tunable algorithm improves upon the horizontal bandwidth cost of the existing Cholesky-QR2 algorithm by up to a factor of c^2 when solved over a c x d x c processor grid subject to P = c^2 d and E[1,P^1/3]. The costs attained by our algorithm are asymptotically equivalent to state-of-the-art QR factorization algorithms that have yet to be implemented. We argue that ours achieves better practicality and flexibility while still attaining minimal communication.
Issue Date:2017-05
Date Available in IDEALS:2017-08-22

This item appears in the following Collection(s)

Item Statistics