Files in this item

FilesDescriptionFormat

application/pdf

application/pdfROBSON-DISSERTATION-2020.pdf (2MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:Techniques for communication optimization of parallel programs in an adaptive runtime system
Author(s):Robson, Michael P
Director of Research:Kale, Laxmikant V
Doctoral Committee Chair(s):Kale, Laxmikant V
Doctoral Committee Member(s):Torellas, Josep; Zilles, Craig; Quinn, Thomas
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):adaptive runtime
hpc
high performance computing
communication optimization
spreading
Charm
Charm++
Abstract:With the current continuation of Moore’s law and the presumed end of improved single core performance, high performance computing (HPC) has turned to increased on-node parallelism in order to address ever growing challenges and numbers of transistors. While this has resulted in a continued increase in overall computing performance, supercomputer networks have lagged far behind in their development and are now oftentimes the singular bottleneck in achieving performance and scalability in modern HPC applications. New machines are consistently built with ‘deeper’ nodes that improve the single node compute performance, as measured by the achievable floating point operations per second (FLOPs), relative to earlier generations with a corresponding increase in network bandwidth or sufficient decrease in latency. This unequal increase has previously partially been addressed by partitioning duties between runtimes at the shared memory node level, e.g. OpenMP, and distributed memory communication level, e.g. MPI, to create a model known as MPI+X. In this work, we present an alternative approach to improving the performance of modern HPC applications utilizing current generation supercomputer networks. We focus on the combination of several of the benefits of the Charm++ programming model, namely overdecompsition, with OpenMP and the ability to ‘spread’ work across several cores. This allows applications to smoothly inject messages onto the network, constantly overlapping their communication requirements with their compute phases, our overall focus for this work. We further describe a complementary suite of techniques to fully utilize modern supercomputers and balance FLOPs and communication. We extend these techniques through micro-benchmark studies and integration into the production scale Charm++ runtime. We also turn our attention from internode communication optimization to apply these same techniques to intranode communication between various hardware devices, i.e. CPUs and graphics processing units, as well. We also discuss many of the tradeoffs of these approaches and attempt to quantify their general effect. While embodied in the Charm++ runtime system, these ideas are applicable to a wide swath of communication bound applications, a class of programs that we expect to only grow over time with the continuing trend of increased differential between node and network performance.
Issue Date:2020-07-16
Type:Thesis
URI:http://hdl.handle.net/2142/108622
Rights Information:Copyright 2020 Michael P. Robson
Date Available in IDEALS:2020-10-07
Date Deposited:2020-08


This item appears in the following Collection(s)

Item Statistics