|Abstract:||As hardware performance and dependability have dramatically improved in the past few decades, the software dependability issues are becoming increasingly important. Unfortunately, many studies show that software bugs, which inevitably slip through various bug detection methods and even the strictest testing before releasing, can greatly affect software dependability during production runs. To improve software dependability during production runs, this dissertation proposes to address software bugs at multiple levels by leveraging support from the underlying hardware, the OS kernel, and the middle-layer runtime.
The proposed multi-level defenses address software bugs and their effects at different stages of program execution. The first-level defense detects software bugs once they are triggered. The detection at the earliest stage can effectively prevent further propagation of errors that are caused by the software bugs. It would be perfect if we could detect all the software bugs at the first-level defense. However, some bugs may still slip through the first-level defense and may be exploited by security attacks. The second-level defense is to detect the exploitation of software bugs in order to control the system damage caused by the potentially exploited bugs. Due to the limitation of the tools or methods deployed in the first-level and second-level defenses, some bugs may still escape them. Additionally, without any further actions for the detected bugs or exploitations at the previous two levels of defenses, what the target system can do is to shut down itself to prevent potential damages, thus is unavailable to users. At this point, the third-level defense recovers the program from software bugs and their effects, thus providing non-stop services. In short, the multi-level defenses complement each other to effectively address software bugs during production runs.
More specifically, in each level of defense, this dissertation proposes a novel low-overhead method to address software bugs during production runs by leveraging support from the hardware, OS, or the runtime. In the first-level defense, this dissertation proposes a low-overhead tool, called SafeMem, to detect memory leaks and memory corruption bugs, two major forms of software bugs that severely threaten system availability and security. It does not require any new hardware extensions. Instead, SafeMem makes a novel use of existing ECC memory technology and exploits intelligent dynamic memory usage behavior analysis to detect memory leak and corruption bugs. The experiments with seven real-world applications show that SafeMem detects all tested bugs with very low overhead (only 1.6%-14.4%).
In the second-level defense, this dissertation proposes a low-overhead, software-only information flow tracking system, called LIFT, to detect the exploitation of software bugs. Without requiring any hardware changes, LIFT minimizes runtime overhead by exploiting dynamic binary translation and optimizations for detecting various types of security attacks. More specifically, LIFT aggressively eliminates unnecessary dynamic information tracking, coalesces information checks, and efficiently switches between target programs and instrumented information flow tracking code. The experiments with two real-world server applications, one client application and eighteen attack benchmarks show that LIFT can effectively detect various types of security attacks. LIFT also incurs very low overhead, only 6.2% for server applications, and 3.6 times on average for seven SPEC INT2000 applications. The proposed dynamic optimizations effectively reduce the overhead by a factor of 5-12 times.
In the third-level defense, this dissertation proposes an innovative technique, called Rx, which can quickly recover programs from many types of software bugs, both deterministic and non-deterministic. The idea, inspired from allergy treatment in real life, is to roll back the program to a recent checkpoint once failure, triggering or exploitation of software bugs that are detected at the first two level of defenses, and then re-execute the program in a modified environment. This idea is based on the observation that many bugs are correlated with their execution environments, and therefore can be avoided by removing the ``allergen'' from the environment. Rx requires few to no modification to applications and provides programmers with additional feedback for bug diagnosis. The experiments with four server applications that contain six bugs of various types show that Rx can survive all the six software failures and provide transparent fast recovery within 0.017-0.16 seconds, 21-51 times faster than the whole system program restart approach for all but one case (CVS).