Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing
- Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing
- Zheng, Gengbin
- Issue Date
- parallel systems
- Parallel machines with an extremely large number of processors (at least tens of thousands processors) are now in operation. For example, the IBM BlueGene/L machine with 128K processors is currently being deployed. It is going to be a significant challenge for application developers to write parallel programs in order to exploit the enormous compute power available and manually scale their applications on such machines. Solving these problems involves finding suitable parallel programming models for such machines and addressing issues like load imbalance. In this thesis, we explore Charm++ programming model and its migratable objects for programming such machines and dynamic load balancing techniques to help parallel applications to easily scale on a large number of processors. We also present a parallel simulator that is capable of predicting parallel performance to help analysis and tuning of the parallel performance and facilitate the development of new load balancing techniques, even before such machines are built. We evaluate the idea of virtualization and its usefulness in helping a programmer to write applications with high degree of parallelism. We demonstrate it by developing several mini-applications with million-way parallelism. We show that Charm++ and AMPI (an extension to MPI) with migratable objects and support for load balancing are suitable programming model for programming very large machines. It is important to understand the performance of parallel applications on very large parallel machines. This thesis explores Parallel Discrete Event Simulation (PDES) techniques with an optimistic synchronization protocol to simulate parallel applications running on a very large number of processors. We optimize the synchronization protocol by exploiting the inherent determinacy that is normally found in parallel applications to reduce the synchronization overhead significantly.
- Type of Resource
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Edit Collection Membership