Files in this item



application/pdf9124404.pdf (7MB)Restricted to U of Illinois
(no description provided)PDF


Title:Concurrent checkpointing for fast recovery in object-based systems
Author(s):DeGroat, Joanne Elizabeth
Doctoral Committee Chair(s):Davidson, Edward S.
Department / Program:Electrical and Computer Engineering
Discipline:Electrical and Computer Engineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Engineering, Electronics and Electrical
Computer Science
Abstract:Traditional checkpoint and recovery are based upon two basic assumptions. The first is the need to halt the computation in progress to save the state of the computation, i.e., take the checkpoint. The second assumption is that the entire state needs to be saved. These assumptions introduce fixed overhead into the system to take the checkpoint and consume space for variables whose state need not be saved. This research investigates a means of breaking these assumptions by developing an architecture that is capable of transparently saving the state of the executing process and of saving only that information required for recovery should an error occur. It also investigates a method of intermediate level recovery, i.e., recovery at levels above a single instruction and lower than that of checkpointing.
A model of computation is developed first to examine the nature and behavior of programs. The model breaks a program into basic blocks, segments of maximal in-line code. Intermediate blocks, blocks composed of several basic blocks, provide a representation at a higher level than basic blocks. At all levels, the model reveals significant information about computations and indicates an approach for the architecture.
The architecture, called the recovery architecture, is based on the concepts of objects and employs capability addressing. A means of transparently saving state, based on capability addressing, is developed. The method is called concurrent checkpointing and saves only that information required for recovery.
An evaluation of the architecture shows that it is capable of very fast recovery should an error occur. At a fine level of granularity where the program is broken into numerous blocks, the expected time of execution, even under very high error rates, may be close to the time of execution when no errors occur. The architecture has application in areas such as aircraft flight control and tracking systems where the expected time of execution of tasks is critical.
Issue Date:1991
Rights Information:Copyright 1991 DeGroat, Joanne Elizabeth
Date Available in IDEALS:2011-05-07
Identifier in Online Catalog:AAI9124404
OCLC Identifier:(UMI)AAI9124404

This item appears in the following Collection(s)

Item Statistics