Stochastic-Model-Driven Adaptation and Recovery in Distributed Systems
Joshi, Kaustubh Raghunandan
- Stochastic-Model-Driven Adaptation and Recovery in Distributed Systems
- Joshi, Kaustubh Raghunandan
- Issue Date
- distributed systems
- Dependability is becoming a requirement in an increasing number of domains, including those that were previously thought to be noncritical. Examples include large distributed systems deployed in domains such as e-commerce, information mining, messaging, and entertainment. Such systems provide a challenge to existing fault tolerance approaches because of their requirements for low-cost solutions that can be adapted to work with off-the-shelf components. At the same time, their scale makes it difficult to accurately diagnose faults and recover from them. This dissertation proposes a model-based solution to building a theoretically well-founded recovery framework based on partially observable Markov decision processes that is inexpensive to deploy, can cope with a variety of recovery mechanisms, and can tolerate system monitoring that may be imperfect, imprecise, or conflicting, and at the same time can generate recovery decisions that ensure that recovery will be stable, provide guarantees on the success of the recovery, and recover the system while incurring as low a cost as possible, thus approximating optimality. We are unaware of any other framework for recovery in distributed systems that integrates monitoring and recovery in an iterative manner, is able to deal with imprecise system states and selectively choose actions that either gather information or make progress towards recovery, and generates recovery policies that minimize costs over entire sequences of recovery actions. We have implemented a tool called the .Adaptation and Recovery Management framework. that implements our approach. We demonstrate that this tool can be used to provide diagnosis and recovery capabilities in practical information systems.
- Type of Resource
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Edit Collection Membership