Files in this item



application/pdfKUMAR-DISSERTATION-2021.pdf (4MB)
(no description provided)PDF


Title:Information sampling from online social networks
Author(s):Kumar, Suhansanu
Director of Research:Sundaram, Hari
Doctoral Committee Chair(s):Sundaram, Hari
Doctoral Committee Member(s):Tong, Hanghang; Koyejo, Sanmi; Jiang, Meng
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
online social network
reinforcement learning
hidden population
Abstract:Data sampling from online social networks is a pre-requisite step for several downstream applications. Further, the massive size of the online social networks coupled with several API limitations and restrictions to the social information makes sampling a challenging problem. This thesis addresses some of the sampling challenges by proposing novel samplers for sampling attributes (content), hidden attributes (population), and networks from online social networks. Specifically, we first propose an information-based sampler in Chapter 3 for sampling content from online social networks. We leverage the surprise of content to direct our sampler towards informative content. The surprise-based sampling strategy allows us to sample the cluster shape and boundary of content clusters efficiently, which is crucial for several data-mining tasks, including clustering, classification, regression, and attribute discovery. We demonstrate our proposed sampler's efficacy on a suite of thirty real-world networks and four data-mining tasks. We further show through empirical counterfactual analysis that network structure does not hinder the performance of surprise-based link-trace samplers in many real-world datasets. Next in Chapter 4, we propose a novel attributed search-based sampler to sample hidden populations. We use a decision-tree-based search strategy to query the attribute-search space systematically. Our proposed decision-tree Thompson sampler follows the exploration and exploitation strategy to sample hidden populations from social networks. We demonstrate our sampler's efficacy over a suite of fourteen sampling tasks on three online social sites and five offline datasets. Furthermore, we show the impact of several factors, like page size, missing information, and noise, affecting hidden population sampling in real-world social networks. Finally, in Chapter 5, we propose a novel framework for learning network samplers. First, we show through theoretical and empirical proof that there exists no universal network sampler that can preserve all the topological properties of the underlying graph in the sample. To address the non-existence issue, we propose a reinforcement learning framework that learns high-quality sampling policies according to application needs. We demonstrate the efficacy of our proposed sampling framework through extensive experiments across ten different graph families and seven diverse tasks. In summary, this thesis develops several sampling strategies for sampling information (attribute, hidden attribute, network) from online social networks while being cognizant of API restrictions' constraints. We propose adaptive samplers that can cater to different application needs.
Issue Date:2021-04-14
Rights Information:Copyright 2021 Suhansanu Kumar
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05

This item appears in the following Collection(s)

Item Statistics