This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
A Fault Tolerance Protocol for Fast Recovery
Doctoral Committee Chair(s)
Kale, Laxmikant V.
Department of Study
Degree Granting Institution
University of Illinois at Urbana-Champaign
This thesis presents research aimed at developing a fault tolerant protocol that is relevant in the context of parallel computing and provides fast restarts. We propose to combine the ideas of message logging and object based virtualization. We leverage the facts that message logging based protocols do not require all processors to rollback when one processor crashes and that object based virtualization allows work to be moved from one processor to another. We develop a message logging protocol that operates in conjunction with object based virtualization. We evaluate and study the implementation of our protocol in the Charm++/AMPI run-time. We use benchmarks and real world applications to investigate and improve the performance of different aspects of our protocol. We also modify the load balancing framework of the Charm++ run-time to work with the message logging protocol. We show that in the presence of faults, an application using our fault tolerance protocol takes less time to complete than a traditional checkpoint based protocol.