Files in this item



application/pdfSAMFASS-THESIS-2016.pdf (2MB)Restricted Access
(no description provided)PDF


Title:Towards a deeper understanding of hybrid programming
Author(s):Samfass, Philipp Johannes
Advisor(s):Gropp, William D; Olson, Luke N
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):hybrid programming models
performance modeling
ping pong benchmark
Message-passing interface (MPI)
MPI shared memory
parallel computing
sparse matrix-vector multiplication
Abstract:With the end of Dennard scaling, future high performance computers are expected to consist of distributed nodes that comprise more cores with direct access to shared memory on a node. However, many parallel applications still use a pure message-passing programming model based on the message-passing interface (MPI). Thereby, they potentially do not make optimal use of shared memory resources. The pure message-passing approach---as argued in this work---is not necessarily the best fit to current and future supercomputing architectures. In this thesis, I therefore present a detailed performance analysis of so-called hybrid programming models that aim at improving performance by combining a shared memory model with the message-passing model on current symmetric multiprocessor (SMP) systems. First, inter-node communication performance is investigated in the context of (hybrid) message-passing programs. A novel performance model for estimating communication performance on current SMP nodes is presented. As is demonstrated, in contrast to the typically used classic postal performance model, the new model allows to more accurately predict inter-node communication performance in the presence of simultaneously communicating processes and saturation of the network interface controller on current multicore architectures. The implications of the new model on hybrid programs are discussed. In addition, I demonstrate the (current) difficulties of multithreaded MPI communication based on results obtained for a multithreaded ping pong benchmark. Moreover, I show how intra-node MPI communication performance can significantly be improved upon for small to medium size messages by saving message-passing overhead and/or superior cache usage. This is achieved through a direct copy in shared memory using either the hybrid MPI+MPI or the MPI+OpenMP programming method. Furthermore, I contrast and evaluate several (pure and hybrid) implementation options for a structured grid sparse matrix-vector multiplication in depth. These choices differ in how hybrid parallelism is exploited at the application level (coarse-grained vs. fine-grained problem decomposition) and with respect to the hybrid programming systems (pure MPI vs. MPI+MPI vs. MPI+OpenMP). I discuss their performance factors such as locality, overhead, efficient use of MPI's derived datatypes, and the serial fraction in Amdahl's law. Moreover, I experimentally demonstrate how a coarse-grained hybrid application design can be used to control these factors, resulting in significant performance improvements (compared to a pure MPI parallelization) in communication and/or synchronization for both the hybrid MPI+MPI and MPI+OpenMP parallel programming approaches for different grid decompositions.
Issue Date:2016-07-18
Rights Information:Copyright 2016 Philipp Samfass
Date Available in IDEALS:2016-11-10
Date Deposited:2016-08

This item appears in the following Collection(s)

Item Statistics