Files in this item



application/pdfCALHOUN-DISSERTATION-2017.pdf (5MB)
(no description provided)PDF


Title:From detection to optimization: impact of soft errors on high-performance computing applications
Author(s):Calhoun, Jon Cameron
Director of Research:Snir, Marc
Doctoral Committee Chair(s):Olson, Luke N.
Doctoral Committee Member(s):Gropp, William; Cappello, Franck
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):High-performance computing
Fault tolerance
Silent data corruption
Soft errors
Error detection
Error recovery
Fault injection
Error propagation
Lossy compression
Abstract:As high-performance computing (HPC) continues to progress, constraints on HPC system design forces the handling of errors to higher levels in the software stack. Of the types of errors facing HPC, soft errors that silently corrupt system or application state are among the most severe. The behavior of HPC applications in the presence of soft errors is critical to gain insight for effective utilization of HPC systems. The need to understand this behavior can be used in developing algorithm-based error detection guided by application characteristics from fault injection and error propagation studies. Furthermore, the realization that applications are tolerant to small errors allows optimizations such as lossy compression on high-cost data transfers. Lossy compression adds small user controllable amounts of error when compressing data, to reduce data size before expensive data transfers saving time. This dissertation investigates and improves the resiliency of HPC applications to soft errors, and explores lossy compression as a new form of optimization for expensive, time-consuming data transfers.
Issue Date:2017-07-12
Rights Information:Copyright 2017 Jon Cameron Calhoun
Date Available in IDEALS:2017-09-29
Date Deposited:2017-08

This item appears in the following Collection(s)

Item Statistics