Files in this item

FilesDescriptionFormat

application/pdf

application/pdfKIM-DISSERTATION-2017.pdf (3MB)
(no description provided)PDF

Description

Title:A schema conversion approach for constructing heterogeneous information networks from documents
Author(s):Kim, Hyung Sul
Doctoral Committee Chair(s):Han, Jiawei
Doctoral Committee Member(s):Hockenmaier, Julia; Zhai, ChengXiang; Dmitriev, Pavel
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Information Network Construction
Abstract:Information networks with multi-typed nodes and edges with different semantics are called heterogenous information networks. Since heterogeneous information networks embed more complex information than homogeneous information networks due to their multi-typed nodes and edges, mining such networks has produced richer knowledge and insights. To extend the application of heterogeneous information network analysis to document analysis, it is necessary to build information networks from a collection of documents while preserving important information in the documents. This thesis describes a schema conversion approach to apply data mining techniques on the outcomes of natural language processing (NLP) tools to construct heterogeneous information networks. First, we utilize named entity recognition (NER) tools to explore networks over entities, topics, and words to demonstrate how a probabilistic model can convert the data schema of the NER tools. Second, we address a pat- tern mining method to construct a network with authors, documents, and writing styles by extracting discriminative writing styles from parse trees and converting them into nodes in a network. Third, we introduce a clustering method to merge redundant nodes in an information network with documents, claims, subjective, objective, and verbs. We use a semantic role labeling (SRL) tool to get initial network structures from news articles, and merge duplicated nodes using a similarity measure SynRank. Finally, we present a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data. The proposed framework ProxiModel utilizes named entity recognition, time expression extraction, and phrase mining tools to get event information from documents.
Issue Date:2017-04-19
Type:Thesis
URI:http://hdl.handle.net/2142/97387
Rights Information:Copyright 2017 Hyung Sul Kim
Date Available in IDEALS:2017-08-10
Date Deposited:2017-05


This item appears in the following Collection(s)

Item Statistics