|Abstract:||In the modern era of computing, processors are increasingly susceptible to soft errors. Current solutions in both hardware and software enable error detection and correction. Some of these errors, however, go unnoticed by detectors and manifest as silent data corruptions (SDCs) at the application level. Injecting errors into the system and evaluating the outcomes is one method to uncover SDC-causing errors and determine an application's overall resilience to soft errors. The number of possible locations that errors may appear in is large, therefore requiring many injection experiments.
One resiliency analysis tool, Relyzer, addresses this issue by performing a comprehensive program analysis to create a small subset of the error injection experiments that can account for the entire application. The limitation of Relyzer is that current analysis can only be performed on one hardwarware instruction set architecture (ISA). Software is usually compiled to multiple ISAs in order to support users with varying hardware configurations.
The primary contribution of this thesis is building Relyzer, an open source version of Relyzer implemented using the gem5 simulator. This enables the capability to analyze multiple ISAs and consequently support multiple hardware configurations in the long-term. Specifically, in this work, we develop support for x86. We also evaluate applications across ISAs by generating error resiliency profiles for both x86 and SPARC. After studying five workloads from different domains, we find that in general, application soft error resiliency varies based on the selection of the ISA. The percentage of static instructions that yield SDCs is, on average, 68\% for x86 and 60\% for SPARC, for the applications we studied. Furthermore, this work opens doors to future research in application-level soft error resiliency analysis.