Files in this item

FilesDescriptionFormat

application/pdf

application/pdfJoseph_Sloan.pdf (13MB)
(no description provided)PDF

Description

Title:Algorithmic approaches to enhancing and exploiting application-level error tolerance
Author(s):Sloan, Joseph
Director of Research:Kumar, Rakesh
Doctoral Committee Chair(s):Kumar, Rakesh
Doctoral Committee Member(s):Vaidya, Nitin H.; Gropp, William D.; Abraham, Jacob A.; Bronevetsky, Greg
Department / Program:Electrical & Computer Engineering
Discipline:Electrical & Computer Engineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Fault Tolerance
Application-level Error Tolerance
Algorithmic Based Fault Tolerance (ABFT)
Application Robustification
Stochastic Processors
Reliability and Hardware Variability
Error localization
Partial Recomputation
Robust Sparse Linear Algebra
Algorithmic Selection for Error Resilience
Abstract:As late-CMOS process scaling leads to increasingly variable circuits/logic and as most post-CMOS technologies in sight appear to have largely stochastic characteristics, hardware reliability has become a first-order design concern. To make matters worse, emerging computing systems are becoming increasingly power constrained. Traditional hardware/software approaches are likely to be impractical for these power constrained systems due to their heavy reliance on redundant, worstcase, and conservative designs. The primary goal of this research has been to investigate how we can leverage inherent application and algorithm characteristics (e.g. natural error resilience, spatial and temporal reuse, and fault containment) to build more efficient robust systems. This dissertation research describes algorithmic approaches that leverage application and algorithm-awareness for building such systems. These approaches include (a) application-specific techniques for low-overhead fault detection, (b) an algorithmic approach for error correction using localization, (c) selection of scientific computing solver schemes to leverage application-level error resilience, and (d) a numerical optimization-based methodology for converting applications into a more error tolerant form. This dissertation shows that application and algorithm-awareness can significantly increase the robustness of computing systems, while also reducing the cost of meeting reliability targets.
Issue Date:2014-01-16
URI:http://hdl.handle.net/2142/46706
Rights Information:Copyright 2013 Joseph Augustyn Sloan
Date Available in IDEALS:2014-01-16
Date Deposited:2013-12


This item appears in the following Collection(s)

Item Statistics