Files in this item



application/pdfZHOU-THESIS-2019.pdf (799kB)
(no description provided)PDF


Title:Algorithms on graph-structured data with imperfect information
Author(s):Zhou, Huozhi
Advisor(s):Varshney, Lav R.
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Graph data
Imperfect information
Efficient inference
Abstract:Graph-structured data is able to characterize pairwise or even higher-order relations among different data points, and has been demonstrated to be highly advantageous in various data mining and machine learning applications. Such graph-structured data may either come from real life networks, or some transformation based on data points. However, in practice the measurement of graph-structured data is usually partially incomplete or incorrect. For example, the measured states of nodes in the graph might be incorrect due to sensor noise. In this thesis, we study two problems on graph-structured data with imperfect information: hypergraph-based active learning and source estimation on directed acyclic graphs (DAGs), all with provable statistical guarantees. In the first part of this thesis, we propose an active learning scheme which is able to accommodate the structure of hypergraphs, termed HS2. HS2 generalizes the previously proposed S2 algorithm which is only able to solve graph-based active learning (GAL) with pointwise oracle. Our HS2 is more flexible in the sense that it is adaptable for three different types of oracles: pointwise oracle, pairwise oracle, as well as noisy pairwise oracle. Based on a novel parametric system particularly designed for hypergraphs, we derive theoretical results on the query complexity of HS2 for the above described settings. Both the theoretical and empirical results show that HS2 outperforms the naive combination of clique expansion and GAL algorithms. Next we develop a heuristic, termed generalized Jordan center (GJC), to estimate the source of a spreading process on a DAG based on noisy and incomplete observations. This problem is motivated by contamination diffusion in a food supply chain. For this setting, identifying the source correctly and efficiently as well as inferring states of unobserved events are of top priorities (the recall problem). We believe this is the first work on source estimation with noisy information. Under mild conditions, GJC is the maximum likelihood (ML) estimator of the diffusion source. Our proposed heuristic is parameter-free (only needs to know the structure of the DAG and states of some nodes), and can be evaluated efficiently by a message-passinglike algorithm in ~O (jV j) complexity (the tilde notation means ignoring the logarithm factor), where V is the vertex set. Experiments on both synthetic and real networks show that GJC has significant gains over a naive extension of Jordan center and is comparable to the exact ML estimate, in terms of source detection probability and false negative rate for recall.
Issue Date:2019-06-19
Rights Information:Copyright 2019 Huozhi Zhou
Date Available in IDEALS:2019-11-26
Date Deposited:2019-08

This item appears in the following Collection(s)

Item Statistics