Files in this item



application/pdfMing_Ji.pdf (5MB)
(no description provided)PDF


Title:Semi-supervised learning and relevance search on networked data
Author(s):Ji, Ming
Director of Research:Han, Jiawei
Doctoral Committee Chair(s):Han, Jiawei
Doctoral Committee Member(s):Roth, Dan; Huang, Thomas S.; Chen, Yuguo; Ye, Jieping
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Data Mining
Machine Learning
Semi-supervised Learning
Heterogeneous Networks
Abstract:Real-world data entities are often connected by meaningful relationships, forming large-scale networks. With the rapid growth of social networks and online relational data, it is widely recognized that networked data are playing increasingly important roles in people's daily life. Based on whether the nodes and edges have different semantic meanings or not, networks can be roughly categorized into heterogeneous and homogeneous networks. Although homogeneous networks have been studied for decades, some problems still remain unsolved. Heterogeneous networks are much more complicated than homogeneous networks, and have not been explored until recently. Therefore, effective and principled algorithms for mining both homogeneous and heterogeneous networks are in great demand. In this thesis, two important and closely related problems, semi-supervised learning and relevance search, are studied on both homogeneous and heterogeneous networks. Different from many existing models, algorithms developed in this thesis are theoretically reasonable, widely applicable with minimum constraints, and provide more informative mining results. First, a label selection criterion is proposed to improve the effectiveness of existing semi-supervised learning models on networks. Second, ranking and semi-supervised learning are integrated together to improve the informativeness of the results. Third, a relevance search algorithm that fully considers the geometric structure of the homogeneous networked data is designed. Finally, the relevance search problem between different types of nodes on heterogeneous networks is studied, and the proposed solution is applied on a network constructed from unstructured text data. Research results introduced in this thesis provide advanced principles and the first few steps towards a complete and systematic solution of mining networked data.
Issue Date:2014-01-16
Rights Information:Copyright 2013 Ming Ji
Date Available in IDEALS:2014-01-16
Date Deposited:2013-12

This item appears in the following Collection(s)

Item Statistics