Files in this item
|(no description provided)|
|Title:||A Theory for Algorithm-Based Fault Tolerance in Array Processor Systems|
|Department / Program:||Electrical Engineering|
|Degree Granting Institution:||University of Illinois at Urbana-Champaign|
|Abstract:||The concept of algorithm-based fault-tolerance deals with system-level methods for obtaining reliable results from computations, especially when performed on array processor systems. In this scheme, algorithms have their outputs encoded in a system-level error-detecting or -correcting code. The algorithms rearrange the computations to allow an array processor system with a faulty processor to produce either a noncodeword output or the correct output. The algorithms also provide appropriate error-checking procedures which operate on the high-level encoded data.
This thesis deals with a theoretical study of the scheme of algorithm-based fault tolerance and addresses four issues. First, it deals with some design issues of specific fault-tolerant and fault-secure schemes. Algorithms are classified into broad classes called paradigms which are determined exclusively by the communication patterns of the processors. Fault-secure techniques are presented for three powerful paradigms: the multiplex, the recursive combination, and the multiplex-demultiplex paradigms.
The second part deals with the development of a model which can be used to analyze the fault-detecting and -locating capabilities of such algorithms. The model uses a broad interpretation of errors, faults and checks, which are represented as a tripartite graph. Three parameters are introduced to characterize the fault-tolerance scheme: the closure index, the masking index and the exposure index. Necessary and sufficient conditions for detecting and locating faults in processors during the actual computations are discussed. The model takes into account the data flow during a computation and can, therefore, pinpoint exactly which faulty processors can affect which data elements.
In the third part, some graph-theoretic bounds are presented on various useful characteristics in algorithm-based fault tolerance. The model is used to determine bounds on the number of data elements that a processor may affect while allowing t-fault detection or t-fault location. Using these results, some upper and lower bounds are presented on the number of checks required to achieve detection or location. Finally, in order to estimate the overhead required in this fault-tolerant scheme, some bounds are derived on the number of processors and the time required for the execution of the checks.
The last part of the thesis deals with a probabilistic study of the scheme. Expressions for the reliability of the results of the computation, and the time for the completion of the computation using a particular algorithm, are derived in terms of various parameters: the number of processors involved, the time for execution, and the fault-detecting, -locating, and -tolerating capabilities of the algorithm. Using the results of the probabilistic model, various alternative approaches of algorithm-based fault tolerance for a given problem are effectively compared.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1985.
|Date Available in IDEALS:||2014-12-15|
This item appears in the following Collection(s)
Dissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer Engineering
Graduate Dissertations and Theses at Illinois
Graduate Theses and Dissertations at Illinois