Files in this item



application/pdfChi_Wang.pdf (3MB)
(no description provided)PDF


Title:Mining latent entity structures from massive unstructured and interconnected data
Author(s):Wang, Chi
Director of Research:Han, Jiawei
Doctoral Committee Chair(s):Han, Jiawei
Doctoral Committee Member(s):Zhai, ChengXiang; Roth, Dan; Chakrabarti, Kaushik
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):data mining
text mining
information network
social network
network analysis
probabilistic graphical model
topic model
phrase mining
relation mining
Information Extraction
Abstract:The “big data” era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone’s daily life. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured but interconnected data. Mining latent structured information around entities uncovers semantic structures from massive unstructured data and hence enables many high-impact applications, including taxonomy or knowledge base construction, multi-dimensional data analysis and information or social network analysis. A mining framework is proposed, to solve and integrate a chain of tasks: hierarchical topic discovery, topical phrase mining, entity role analysis and entity relation mining. It reveals two main forms of structures: topical and relational structures. The topical structure summarizes the topics associated with entities with various granularity, such as the research areas in computer science. The framework enables recursive construction of phrase-represented and entity-enriched topic hierarchy from text-attached information networks. It makes breakthrough in terms of quality and computational efficiency. The relational structure recovers the hidden relationship among entities, such as advisor-advisee. A probabilistic graphical modeling approach is proposed. The method can utilize heterogeneous attributes and links to capture all kinds of semantic signals, including constraints and dependencies, to recover the hierarchical relationship with the best known accuracy.
Issue Date:2015-01-21
Rights Information:Copyright 2014 Chi Wang
Date Available in IDEALS:2015-01-21
Date Deposited:2014-12

This item appears in the following Collection(s)

Item Statistics