Files in this item



application/pdfREISNER-DISSERTATION-2019.pdf (3MB)
(no description provided)PDF


Title:Robust structured multigrid at extreme scales
Author(s):Reisner, Andrew
Director of Research:Olson, Luke
Doctoral Committee Chair(s):Olson, Luke
Doctoral Committee Member(s):Moulton, David; Gropp, William; Kloeckner, Andreas
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Parallel Computing
Iterative Methods
High Performance Computing
Abstract:The solution of elliptic partial differential equations is a common performance bottleneck in scientific simulations. By exploiting structure in a problem, robust structured multigrid methods gain important performance benefits because they preserve structure throughout the multigrid hierarchy. In parallel these methods benefit from nearest neighbor stencil-based communication patterns; however, the increased communication demands of coarse-grid problems and block smoothers needed for a robust solver challenge parallel efficiency. In this dissertation, methods for reducing parallel communication through changes in the parallel implementation are explored. To reduce communication costs for coarse-grid problems, recursive agglomeration of tasks on a logically structured grid of processors is considered. This communication is optimized using a predictive performance model to guide how tasks are agglomerated. This approach provides an efficient strategy for parallel coarsening in a structured setting that can adapt to changes in the target architecture or multigrid algorithm through its incorporation in the performance model. Parallel results show favorable weak scaling using this strategy out to \(500\)k cores and consistency of the performance model in quantifying the cost of various redistribution decisions. To reduce communication costs in block smoothers, an automated strategy for aggregating communication across blocks is considered. With minor changes to a block solver due to the introduction of a service abstraction layer, user-level threads are used to execute blocks concurrently so communication can be aggregated. This results in a reduction in the amount of messages sent during a block smoothing operation. This strategy is demonstrated in plane smoothing to extend the strong scaling limit by reducing communication latency costs. Parallel results demonstrate scalable multilevel relaxation with \(\log p\) communication complexity and plane relaxation with automated communication aggregation that doubles the strong scaling performance of a V-cycle. Lastly, the application of robust structured solvers to emerging heterogeneous architectures is considered. Benchmarks are used to develop a performance expectation for structured matrix-based operations on each target processing unit. OpenMP with unified memory is then used to offload solve phase operations in the open-source, structured variational multigrid solver Cedar. The performance expectation is then used to provide context for performance gains by targeting GPUs on Sierra---a current Power9 system at Lawrence Livermore National Laboratory. Results show speedup of a Cedar V-cycle targeting a V100 GPU over a Power9 CPU consistent with an approximate speedup estimated by comparing achievable memory bandwidth on each processing unit.
Issue Date:2019-11-25
Rights Information:Copyright 2019 Andrew Reisner
Date Available in IDEALS:2020-03-02
Date Deposited:2019-12

This item appears in the following Collection(s)

Item Statistics