Files in this item

FilesDescriptionFormat

application/pdf

application/pdf9314848.pdf (5MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:Learning and Smooth Simultaneous Estimation of Errors Based on Empirical Data
Author(s):Buescher, Kevin Lee
Doctoral Committee Chair(s):Kumar, P.R.
Department / Program:Electrical Engineering
Discipline:Electrical Engineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Statistics
Engineering, System Science
Artificial Intelligence
Computer Science
Abstract:This thesis examines issues related to Valiant's Probably Approximately Correct (PAC) model for learning from examples. In this model, a student observes examples that consist of sample points drawn according to a fixed, unknown probability distribution and labeled by a fixed, unknown binary-valued function. Based on this empirical data, the student must select, from a set of candidate functions, a particular function, or "hypothesis," that will accurately predict the labels of future sample points. The expected mismatch between its prediction and the label of a new sample point is called a hypothesis' "generalization error."
We treat a more realistic problem than Valiant's model in this thesis. We allow the labels and the hypotheses to take general values (rather than just the values zero and one). We do not assume that a hypothesis with zero generalization error exists. Further, we do not even assume that a deterministic functional relationship exists between the sample points and the labels. In particular, the observed values of the labels and sample points may be subject to noise. Also, we allow prior knowledge about the set of probability distributions to be incorporated into the problem. In addition, we address the issue of determining the appropriate complexity for the class of candidate hypotheses. This is related to the problem of the tendency to fit the noise in the data, and the attendant increase in generalization error, as the complexity of the candidate hypotheses increases.
Following the pioneering work of Vapnik and Chervonenkis, others have attacked this sort of learning problem by finding hypotheses that minimize the relative frequency-based empirical error estimate. We generalize this approach by examining the "simultaneous estimation" problem: When does some procedure exist for estimating the generalization error of all of the candidate hypotheses, simultaneously, from the same labeled sample? We demonstrate how one can learn from such a simultaneous error estimate and propose a new class of estimators, called "smooth estimators," that, in many cases of interest, contains the empirical estimator. We characterize the class of simultaneous estimation problems solvable by a smooth estimator. We give a canonical form for the smooth simultaneous estimator. Further, we show that, when the empirical estimator is smooth, a learning procedure based on the canonical estimator will work in every case in which empirical error minimization does. We derive bounds to show that the number of samples required by the canonical estimator and the empirical estimator is comparable.
Issue Date:1993
Type:Text
Description:115 p.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1993.
URI:http://hdl.handle.net/2142/71990
Other Identifier(s):(UMI)AAI9314848
Date Available in IDEALS:2014-12-16
Date Deposited:1993


This item appears in the following Collection(s)

Item Statistics