IDEALS Home University of Illinois at Urbana-Champaign logo The Alma Mater The Main Quad

Density-based clustering of information networks by substructure optimization

Show full item record

Bookmark or cite this item: http://hdl.handle.net/2142/16220

Files in this item

File Description Format
PDF Bortner_Dustin.pdf (677KB) (no description provided) PDF
Title: Density-based clustering of information networks by substructure optimization
Author(s): Bortner, Dustin P.
Advisor(s): Han, Jiawei
Department / Program: Computer Science
Discipline: Computer Science
Degree Granting Institution: University of Illinois at Urbana-Champaign
Degree: M.S.
Genre: Thesis
Subject(s): density-based clustering networks classification graph mining data mining
Abstract: Information networks, such as biological or social networks, contain groups of related entities, which can be identified by clustering. Density-based clustering (DBC) differs from vertex-partitioning methods in that some vertices are classified as noise. This approach is useful in practice to classify groups of related entities within noisy networks. The baseline DBC method involves constructing a maximal spanning forest (MSF) and deleting edges having weights below a threshold, leaving the connected components as clusters. In large networks, the data may contain large scale variances in the noise density level, which causes the baseline method to perform poorly. We investigate whether clustering within substructures of the MSF can improve the result. We present a new set of similarity measures for weighted networks that allow the MSF to connect vertices that are not connected in the original network. We convert the MSF to a hierarchy and merge it bottom-up by optimizing an objective function to produce a flat clustering. We present two new objective functions based on density and irregularity measures. Experiments show that the new similarity measures help to normalize noise density levels and outperform traditional set-based similarity measures; that the proposed substructure optimization method improves over the baseline in networks with many classes; and that DBC outperforms vertex-partitioning methods for classification in noisy networks. The results of the irregularity objective are promising, as it scales with the number of vertices, whereas objectives such as density or modularity scale with the number of edges.
Issue Date: 2010-05-19
URI: http://hdl.handle.net/2142/16220
Rights Information: Copyright 2010 Dustin P. Bortner
Date Available in IDEALS: 2010-05-19
Date Deposited: May 2010
 

This item appears in the following Collection(s)

Show full item record

Item Statistics

  • Total Downloads: 232
  • Downloads this Month: 15
  • Downloads Today: 0

Browse

My Account

Information

Access Key