Files in this item

FilesDescriptionFormat

application/pdf

application/pdfBU-DISSERTATION-2019.pdf (1MB)
(no description provided)PDF

Description

Title:Information-theoretic bounds in learning algorithms
Author(s):Bu, Yuheng
Director of Research:Veeravalli, Venugopal V.
Doctoral Committee Chair(s):Veeravalli, Venugopal V.
Doctoral Committee Member(s):Raginsky, Maxim; Varshney, Lav R.; Small, Kevin
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):active and adaptive learning
model change detection
mutual information based generalization bounds
model compression
Abstract:The focus of this thesis is on understanding machine learning algorithms from an information-theoretic point of view. More specifically, we apply information-theoretic tools to construct performance bounds for the learning algorithms, with the goal of deepening the understanding of current algorithms and inspiring new learning techniques. The first problem considered involves a sequence of machine learning problems that vary in a bounded manner from one time-step to the next. To solve these problems in an accurate and data-efficient way, an active and adaptive learning framework is proposed, in which the labels of the most informative samples are actively queried from an unlabeled data pool, and the adaptation to the change is achieved by utilizing the information acquired in previous steps. The goal is to satisfy a pre-specified bound on the excess risk at each time-step. More specifically, the design of the active querying algorithm is based on minimizing the excess risk using stochastic gradient descent in the maximum likelihood estimation setting. Our algorithm and theoretical results are validated by experiments with synthetic and real data. To determine whether the active and adaptive learning framework is applicable in practice, we then study the problem of model change detection. There are two sets of samples that are generated according to a pre-change probabilistic model with parameter theta, and a post-change model with parameter theta', respectively. The goal is to detect whether the change in the model is significant. We construct an empirical difference test (EDT), which has low computational complexity. Moreover, we provide an approximation method to set the threshold of the EDT to meet the false alarm constraint. Experiments with linear regression and logistic regression are conducted to validate the proposed algorithms. Another key contribution of this thesis is in the area of mutual information-based generalization error bounds of supervised learning algorithms. Our bound is constructed in terms of the mutual information between each individual training sample and the output of the learning algorithm, which requires weaker conditions on the loss function, and provides a tighter characterization of the generalization error than existing studies. Examples are further provided to demonstrate that the proposed bound is tighter and has a broader range of applicability. Finally, an application of this mutual information-based generalization error bound is considered. We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces the mutual information-based generalization error bound; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. We show through a linear regression example that such an improvement in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted K-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions.
Issue Date:2019-08-14
Type:Text
URI:http://hdl.handle.net/2142/106140
Rights Information:Copyright 2019 Yuheng Bu
Date Available in IDEALS:2020-03-02
Date Deposited:2019-12


This item appears in the following Collection(s)

Item Statistics