IDEALS Home University of Illinois at Urbana-Champaign logo The Alma Mater The Main Quad

Density-Based Clustering of Information Networks by Substructure Optimization

Show full item record

Bookmark or cite this item: http://hdl.handle.net/2142/15489

Files in this item

File Description Format
PDF Bortner_Dustin.pdf (677KB) Thesis PDF
Title: Density-Based Clustering of Information Networks by Substructure Optimization
Author(s): Bortner, Dustin
Subject(s): data mining clustering graph mining information network
Abstract: Information networks, such as biological or social networks, contain groups of related entities, which can be identified by clustering. Density-based clustering (DBC) differs from vertex-partitioning methods in that some vertices are classified as noise. This approach is useful in practice to classify groups of related entities within noisy networks. The baseline DBC method involves constructing a maximal spanning forest (MSF) and deleting edges having weights below a threshold, leaving the connected components as clusters. In large networks, the data may contain large scale variances in the noise density level, which causes the baseline method to perform poorly. We investigate whether clustering within substructures of the MSF can improve the result. We present a new set of similarity measures for weighted networks that allow the MSF to connect vertices that are not connected in the original network. We convert the MSF to a hierarchy and merge it bottom-up by optimizing an objective function to produce a flat clustering. We present two new objective functions based on density and irregularity measures. Experiments show that the new similarity measures help to normalize noise density levels and outperform traditional set-based similarity measures; that the proposed substructure optimization method improves over the baseline in networks with many classes; and that DBC outperforms vertex-partitioning methods for classification in noisy networks. The results of the irregularity objective are promising, as it scales with the number of vertices, whereas objectives such as density or modularity scale with the number of edges.
Issue Date: 2010-05
Genre: Dissertation / Thesis
Type: Text
Language: English
URI: http://hdl.handle.net/2142/15489
Publication Status: unpublished
Peer Reviewed: not peer reviewed
Date Available in IDEALS: 2010-05-14
 

This item appears in the following Collection(s)

Show full item record

Item Statistics

  • Total Downloads: 190
  • Downloads this Month: 0
  • Downloads Today: 0

Browse

My Account

Information

Access Key