Files in this item



application/pdfBortner_Dustin.pdf (677kB)
(no description provided)PDF


Title:Density-based clustering of information networks by substructure optimization
Author(s):Bortner, Dustin P.
Advisor(s):Han, Jiawei
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
graph mining
data mining
Abstract:Information networks, such as biological or social networks, contain groups of related entities, which can be identified by clustering. Density-based clustering (DBC) differs from vertex-partitioning methods in that some vertices are classified as noise. This approach is useful in practice to classify groups of related entities within noisy networks. The baseline DBC method involves constructing a maximal spanning forest (MSF) and deleting edges having weights below a threshold, leaving the connected components as clusters. In large networks, the data may contain large scale variances in the noise density level, which causes the baseline method to perform poorly. We investigate whether clustering within substructures of the MSF can improve the result. We present a new set of similarity measures for weighted networks that allow the MSF to connect vertices that are not connected in the original network. We convert the MSF to a hierarchy and merge it bottom-up by optimizing an objective function to produce a flat clustering. We present two new objective functions based on density and irregularity measures. Experiments show that the new similarity measures help to normalize noise density levels and outperform traditional set-based similarity measures; that the proposed substructure optimization method improves over the baseline in networks with many classes; and that DBC outperforms vertex-partitioning methods for classification in noisy networks. The results of the irregularity objective are promising, as it scales with the number of vertices, whereas objectives such as density or modularity scale with the number of edges.
Issue Date:2010-05-19
Rights Information:Copyright 2010 Dustin P. Bortner
Date Available in IDEALS:2010-05-19
Date Deposited:May 2010

This item appears in the following Collection(s)

Item Statistics