Files in this item

FilesDescriptionFormat

application/pdf

application/pdfECE499-Sp2016-rameshkumar.pdf (1MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:Joint classification and information extraction framework
Author(s):Rameshkumar, Revanth
Contributor(s):Nahrstedt, Klara
Degree:B.S. (bachelor's)
Genre:Thesis
Subject(s):natural language processing
machine learning models
joint models
text classification
information extraction
Abstract:This thesis proposes a joint Information-Extraction and Classification model for document analysis in domain specific text. Existing information extraction (IE) systems typically try to extract key value pairs or target phrases by learning from user-provided examples or depend on a strong named-entity tagger, as in the Snowball information extraction system. Others, while not depending on user provided IE patterns, end up depending on part of speech, syntactic, or semantic tagged data to extract target phrases; or depend on heavily annotated text to build a learning dictionary. The disadvantage with this is that it takes many man-hours to build a usable training dataset. This is especially disadvantageous when the cost of assigning a domain expert to tasks like tagging and annotating is too great to be practical. This thesis describes a prototype system RICE (Rev’s Iterative Classifier Extractor) that is able to extract information from domain specific text using only a set of labeled (domain relevant or domain irrelevant) documents. The system is trained using only labeled documents and outputs a set of relevant phrases, an Information Extraction Pattern ranker model, and a usable document classifier. An iterative approach is used where extracted noun phrases are used to both simultaneously train a classifier and build a ranked IE Pattern list. The results show that the joint classification and IE model approach definitely works and produces results that are greater enough than chance that the model is worth further pursuit. In fact, it definitely has the potential to be used in production systems. However, there is quite a bit of work that needs to be done to eliminate noise and increase precision. We also discuss next steps, improvements, applications, and future works at the end of the thesis.
Issue Date:2016-05
Genre:Dissertation / Thesis
Type:Text
Language:English
URI:http://hdl.handle.net/2142/91561
Date Available in IDEALS:2016-08-30


This item appears in the following Collection(s)

Item Statistics