IDEALS Home University of Illinois at Urbana-Champaign logo The Alma Mater The Main Quad

Generalization Bounds for the Area Under an ROC Curve

Show full item record

Bookmark or cite this item: http://hdl.handle.net/2142/10825

Files in this item

File Description Format
PDF Generalization ... rea Under an ROC Curve.pdf (375KB) (no description provided) PDF
Title: Generalization Bounds for the Area Under an ROC Curve
Author(s): Agarwal, Shivani; Graepel, Thore; Herbrich, Ralf; Har-Peled, Sariel; Roth, Dan
Subject(s): Algorithms
Abstract: We study generalization properties of the area under an ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for bipartite ranking problems. The AUC is a different and more complex term than the error rate used for evaluation in classification problems; consequently, existing generalization bounds for the classification error rate cannot be used to draw conclusions about the AUC. In this paper, we define a precise notion of the expected accuracy of a ranking function (analogous to the expected error rate of a classification function), and derive distribution-free probabilistic bounds on the deviation of the empirical AUC of a ranking function (observed on a finite data sequence) from its expected accuracy. We derive both a large deviation bound, which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on a test sequence, and a uniform convergence bound, which serves to bound the expected accuracy of a learned ranking function in terms of its empirical AUC on a training sequence. Our uniform convergence bound is expressed in terms of a new set of combinatorial parameters that we term the bipartite rank-shatter coefficients; these play the same role in our result as do the standard shatter coefficients (also known variously as the counting numbers or growth function) in uniform convergence results for the classification error rate. We also compare our result with a recent uniform convergence result derived by Freund et al. (2003) for a quantity closely related to the AUC; as we show, the bound provided by our result is considerably tighter.
Issue Date: 2004-05
Genre: Technical Report
Type: Text
URI: http://hdl.handle.net/2142/10825
Rights Information: You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS: 2009-04-14
 

This item appears in the following Collection(s)

Show full item record

Item Statistics

  • Total Downloads: 326
  • Downloads this Month: 10
  • Downloads Today: 0

Browse

My Account

Information

Access Key