Files in this item

FilesDescriptionFormat

application/pdf

application/pdfSHANG-THESIS-2019.pdf (591kB)
(no description provided)PDF

Description

Title:A reformulated approach to attribute-aware sampling on large networks
Author(s):Shang, Charles
Advisor(s):Sundaram, Hari
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Sampling
Networks
Data Mining
Abstract:Sampling has long been an important tool for extracting subsets of data for data mining tasks. As the scale of information produced has increased, efficient sampling is only becoming more important. Uniform sampling is often the preferred technique of choice, due to its simplicity and speed. However, many network based data sources prevent random access, necessitating a different way to sample. Algorithms like Breadth first search, Random walk, Expansion sampling, or other related strategies fulfill this role currently. But these algorithms are focused mainly on ensuring properties based on the structure of the graph, without consideration for the attributes of each node. In this study, we take an existing attribute aware sampler and propose a natural reformulation of the algorithm. We present a new surprise function that avoids some drawbacks of a previous work and take advantage of the submodularity property to reduce the computation that needs to be done when selecting a node and make some arguments about the efficiency and effectiveness of such a strategy. We test our algorithm on some real world data sets and found that our algorithm had increases in sample attribute coverage by up to 4 times when compared to techniques like random walk while still taking time approximately linear in the size of the sample
Issue Date:2019-04-22
Type:Text
URI:http://hdl.handle.net/2142/104885
Rights Information:Copyright 2019 Charles Shang
Date Available in IDEALS:2019-08-23
Date Deposited:2019-05


This item appears in the following Collection(s)

Item Statistics