Files in this item

FilesDescriptionFormat

application/pdf

application/pdfCHAUHAN-THESIS-2020.pdf (741kB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:Semantic pattern discovery in open information extraction
Author(s):Chauhan, Aabhas
Advisor(s):Han, Jiawei
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Information Extraction
Pattern Mining
Abstract:Open information extraction (OpenIE) is a novel paradigm that produces structured information from unstructured text with minimum or no supervision. The task involves extracting relevant relation tuples or expressions from a text corpus. Existing methods in the domain tend to produce a large percentage of ill-structured, incomplete or redundant extractions which cannot be directly used in downstream applications, and often fail on sentences with long and complex structures. In this paper, we propose a novel semantic pattern-discovery for OpenIE (SemPatIE) framework which extracts relations in the form of typed textual pattern structures, called meta patterns and groups semantically similar pattern structures. To perform these tasks, the framework uses three techniques: (1) it simplifies complex sentence structures by performing a context-aware sentence segmentation method which splits the dependency graph of sentences at noun or verb level and enables pattern extraction between distantly placed entities; (2) it extracts meta patterns and handles its pattern sparsity problem by introducing a novel idea of iterative frequent pattern mining and nested push-ups; (3) it generates semantic pattern clusters by embedding a multi text-based network between entities, entity types, extracted meta patterns and context words. Experiments show SemPatIE outperforms state-of-the-art OpenIE baselines in handling structurally complex sentences and has a significantly higher recall than existing pattern-based methods. Case studies exhibit the framework's high generalization ability and scalabilty, and effective clustering performance which has direct applications in downstream tasks like knowledge graph construction, evidence mining and truth finding.
Issue Date:2020-05-13
Type:Thesis
URI:http://hdl.handle.net/2142/108194
Rights Information:Copyright 2020 Aabhas Chauhan
Date Available in IDEALS:2020-08-26
Date Deposited:2020-05


This item appears in the following Collection(s)

Item Statistics