Performance evaluation of a Ray-based MPI runtime for hybrid HPC-cloud systems
Song, Yifei
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/129588
Description
Title
Performance evaluation of a Ray-based MPI runtime for hybrid HPC-cloud systems
Author(s)
Song, Yifei
Issue Date
2025-04-28
Director of Research (if dissertation) or Advisor (if thesis)
Kindratenko, Volodymyr
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Ray
MPI
HPC
Abstract
In this work, I present two contributions to the development and benchmarking of the Ray Collective Communication Library (CCL) and its integration with MPI workflows. First, I implemented examples demonstrating the adaptation of mpi4py code to Ray CCL backend. This includes translation and validation of critical mpi4py test cases, specifically focusing on fundamental operations such as send, receive, broadcast, and MPI Group functionalities. I demonstrated that the translated code produces identical outputs to the original mpi4py tests, ensuring reliability and correctness. The adaptation of mpi4py code to Ray CCL backend enables the migration of existing MPI applications to HPC cloud environments where mpi4py may not be suitable. By mapping MPI communication semantics to Ray CCL, we are able to fully leverage Ray’s advantages, including elastic scheduling, fault tolerance, resource-aware execution, and cloud-native support, making it possible to run traditionally MPI-bound HPC applications more effectively on the modern cloud infrastructure. Second, I designed and implemented a comprehensive benchmarking suite to evaluate the performance of the Ray-based MPI communication backend in comparison with mpi4py. This benchmarking framework provides valuable insights into the relative performance characteristics of both communication paradigms, particularly in hybrid HPC-Cloud scenarios where traditional MPI backends may face challenges. Through my validation work and comprehensive benchmarking analysis, I helped evaluate the feasibility of bridging traditional HPC communication patterns with cloud-native distributed computing frameworks. My testing contributions support the broader goal of enabling workload migration across diverse computing platforms.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.