Files in this item

FilesDescriptionFormat

application/pdf

application/pdfQuanquan_Gu.pdf (1MB)
(no description provided)PDF

Description

Title:Online and active learning of big networks: theory and algorithms
Author(s):Gu, Quanquan
Director of Research:Han, Jiawei
Doctoral Committee Chair(s):Han, Jiawei
Doctoral Committee Member(s):Roth, Dan; Zhai, ChengXiang; Aggarwal, Charu C.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Big Networks
Online Learning
Active Learning
Abstract:We are living in the Internet Age, in which information entities and objects are interconnected, thereby forming gigantic information networks. These networks are not only massive, but also grow and evolve very quickly. It is critical to quickly process and understand these networks in order to enable data-driven applications. On the other hand, the labels of the nodes in big networks are scarce. It is urgent to optimize the process by which the labels are collected, because it is unrealistic to get labels of every node. The objective of my research is to develop algorithms for big network analytics, which are both statistically and computationally efficient, and with provable guarantee on their performance. In particular, I present active learning, online learning, selective sampling (online active learning), and online learning with bandit feedback algorithms for learning in a network. In the first part of this thesis, I propose a \textit{nonadaptive} active learning approach on a graph, based on generalization error bound minimization. In particular, I present a data-dependent error bound for a graph-based learning method, namely learning with local and global consistency (LLGC). I show that the empirical transductive Rademacher complexity of the function class for LLGC provides a natural criterion for active learning. The resulting active learning approach is to select a subset of nodes on a graph such that the empirical transductive Rademacher complexity of LLGC is minimized. I propose a simple yet effective sequential optimization algorithm to solve it. In the second part of this thesis, I first present an online learning algorithm on a graph for binary node classification. It is an online version of the well-known Learning with Local and Global Consistency method (OLLGC). It is essentially a second-order online learning algorithm, and can be seen as an online ridge regression in the Hilbert space of functions defined on a graph. I prove its regret bound in terms of the structural property (cut size) of a graph. Based on OLLGC, I present a selective sampling algorithm, namely Selective Sampling with Local and Global Consistency (SSLGC), which adaptively queries the label of each node based on the confidence of the linear function on a graph. Its regret bound as well as its label complexity are also derived. I also analyze the low-rank approximation of graph kernels, which enables the online algorithms scale to large graphs. In the third part of this thesis, I first extend the online binary classification algorithm to multi-class regime, based on spectral learning technique. The resulting algorithm is called online spectral learning on a graph (OSLG). Then I study online learning on a graph with bandit feedback, where after the learner makes a prediction of the node label, the oracle will return a single bit indicating whether the prediction is correct or not. I propose an algorithm namely online spectral learning on a graph with bandit feedback (OSLG Bandit), which uses upper-confidence bound of the nonparametric classifier on a graph to trade off the exploration and exploitation of label information. I derive regret bounds for both algorithms, which clearly characterize the difficulty of online learning on a graph for multi-class node classification, both in the full information setting and in the partial information setting. All in all, my efforts in this thesis have provided important findings in these areas, and have yielded new algorithms and theory.
Issue Date:2014-09-16
URI:http://hdl.handle.net/2142/50757
Rights Information:Copyright 2014 Quanquan Gu
Date Available in IDEALS:2014-09-16
Date Deposited:2014-08


This item appears in the following Collection(s)

Item Statistics