Withdraw
Loading…
A schema conversion approach for constructing heterogeneous information networks from documents
Kim, Hyung Sul
Loading…
Permalink
https://hdl.handle.net/2142/97387
Description
- Title
- A schema conversion approach for constructing heterogeneous information networks from documents
- Author(s)
- Kim, Hyung Sul
- Issue Date
- 2017-04-19
- Doctoral Committee Chair(s)
- Han, Jiawei
- Committee Member(s)
- Hockenmaier, Julia
- Zhai, ChengXiang
- Dmitriev, Pavel
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Information network construction
- Abstract
- Information networks with multi-typed nodes and edges with different semantics are called heterogenous information networks. Since heterogeneous information networks embed more complex information than homogeneous information networks due to their multi-typed nodes and edges, mining such networks has produced richer knowledge and insights. To extend the application of heterogeneous information network analysis to document analysis, it is necessary to build information networks from a collection of documents while preserving important information in the documents. This thesis describes a schema conversion approach to apply data mining techniques on the outcomes of natural language processing (NLP) tools to construct heterogeneous information networks. First, we utilize named entity recognition (NER) tools to explore networks over entities, topics, and words to demonstrate how a probabilistic model can convert the data schema of the NER tools. Second, we address a pat- tern mining method to construct a network with authors, documents, and writing styles by extracting discriminative writing styles from parse trees and converting them into nodes in a network. Third, we introduce a clustering method to merge redundant nodes in an information network with documents, claims, subjective, objective, and verbs. We use a semantic role labeling (SRL) tool to get initial network structures from news articles, and merge duplicated nodes using a similarity measure SynRank. Finally, we present a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data. The proposed framework ProxiModel utilizes named entity recognition, time expression extraction, and phrase mining tools to get event information from documents.
- Graduation Semester
- 2017-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/97387
- Copyright and License Information
- Copyright 2017 Hyung Sul Kim
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…