Files in this item

FilesDescriptionFormat

application/pdf

application/pdfMei_Chao.pdf (3MB)
(no description provided)PDF

Description

Title:Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines
Author(s):Mei, Chao
Director of Research:Kale, Laxmikant V.
Doctoral Committee Chair(s):Kale, Laxmikant V.
Doctoral Committee Member(s):Padua, David A.; Torrellas, Josep; Balaji, Pavan
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Multicore shared-memory optimizations
Multithreaded adaptive parallel language runtime
MPI+OpenMP
High Performance Computing (HPC)
Load balancing
Parallel programming
Molecule dynamics simulation performance
Charm++
Abstract:Multicore chips have become the standard building blocks for all current and future massively parallel machines. Much work has been done in scientific and engineering HPC applications to exploit shared-memory multicore nodes. This thesis, in contrast, pays close attention to the parallel language runtime system–a software layer that supports the execution of parallel applications. The essential idea is to parallelize the language runtime with threads as a natural consequence of the same general approach in applications to take advantage of the shared memory on a multicore node. Using the asynchronous message-driven CHARM++ runtime system as an evaluation platform, we address the key question of how the runtime should be designed and how it can be optimized for multicore nodes on parallel machines so that applications running atop the runtime can achieve better performance with as few changes as possible. Since the runtime performance on a single node is the basis for the overall runtime performance at scale, we have identified key factors for the runtime to run well on a single node, and developed corresponding optimization techniques. We have also developed the CkLoop library in the CHARM++ runtime, which showcases the necessity of a unified runtime that can make better support of the parallelism at different granularity. Furthermore, we have explored the design space of work responsibility assignment among the threads in the multithreaded runtime. In the context of a runtime design of dedicated communication threads, we have investigated the consequent communication issues with the help from our extension to a performance analysis tool, and proposed methods that can resolve the issues. To achieve even better performance in applications, we have shown how developers can leverage new capabilities offered by the runtime, and developed new load balancing strategies that are more effective on multicore platforms. Finally, we have demonstrated the performance improvement on real production-levelscientific applications, including NAMD, a widely-used molecular dynamics simulation program, by using this multithreaded runtime on petascale massively parallel machines. In the case of the 100M-atom STMV simulation using NAMD, the multithreaded runtime leads NAMD to achieve about two-fold performance improvement on 224,076 cores of JaguarPF (Cray XT5), and about three times improvement in machine utilization on Intrepid (BlueGene/P). It also makes NAMD more scalable up to the full machine of JaguarPF and Titan (Cray XK6).
Issue Date:2012-09-18
URI:http://hdl.handle.net/2142/34238
Rights Information:Copyright 2012 Chao Mei
Date Available in IDEALS:2012-09-18
Date Deposited:2012-08


This item appears in the following Collection(s)

Item Statistics