Files in this item



application/pdfLi_Yun.pdf (367kB)
(no description provided)PDF


Title:Binary classification with training under both classes
Author(s):Li, Yun
Advisor(s):Veeravalli, Venugopal V.
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):binary hypothesis testing
countably infinite alphabet
binary classification
Stein's lemma
finite sample performance
Abstract:This thesis focuses on the binary classification problem with training data under both classes. We first review binary hypothesis testing problems and present a new result on the case of countably infinite alphabet. The goal of binary hypothesis testing is to decide between the two underlying probabilistic processes. Asymptotic optimality of binary hypothesis testing can be achieved with the knowledge of only one of the processes. It is also shown that the finite sample performance could improve greatly with additional knowledge of the alternate process. Most previous work focuses on the case where the alphabet is finite. This thesis extends the existing results to the case of countably infinite alphabet. It is proved that, without knowledge of the alternate process, the worst-case performance of any test is arbitrarily bad, even when the alternate process is restricted to be ``far'' in the sense of relative entropy. Binary classification problems arise in applications where a full probabilistic model of either of the processes is absent and pre-classified samples from both of the processes are available. It is known that asymptotic optimality can be achieved with the knowledge of only one pre-classified training sequence. We propose a classification function that depends on both training sequences. Then Stein's lemma for classification is proved using this new classification function. It states that the maximal error exponent under one class is given by the relative entropy between the conditional distributions of the two classes. Our results also shed light on how the classification errors depend on the relative size of the training and test data. It is shown in the simulation results that our classification method outperforms the asymptotically optimal one when the test samples are of limited size.
Issue Date:2012-09-18
Rights Information:Copyright 2012 Yun Li
Date Available in IDEALS:2012-09-18
Date Deposited:2012-08

This item appears in the following Collection(s)

Item Statistics