Files in this item



application/pdf3102009.pdf (6MB)Restricted to U of Illinois
(no description provided)PDF


Title:Machine Learning for Information Extraction
Author(s):Zelenko, Dmitry
Doctoral Committee Chair(s):Roth, Dan
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Artificial Intelligence
Abstract:The dissertation presents a number of novel machine learning techniques and applies them to information extraction. The study addresses several information extraction subtasks: part of speech tagging, entity extraction, coreference resolution, and relation extraction. Each of the tasks is formalized as a learning problem and appropriate learning algorithms are developed and applied to the problem. The dissertation studies part of speech tagging as a multi-class classification problem, and applies the SNOW (Sparse Network of Winnows) learning system to learn a part of speech classifier. A comprehensive experimental evaluation of the system confirms that it is appropriate for NLP applications. The dissertation addresses the problem of entity extraction is conjunction with coreference resolution. A classification approach is presented for entity extraction, and coreference resolution is treated from the decoding perspective. The dissertation describes novel decoding algorithms that given local coreference decisions produce a global coherent interpretation of document entities. The dissertation studies the problem of relation extraction as a classification problem, and applies kernel methods to learn the relation classifiers. Novel kernels are defined in terms of shallow parses, and efficient algorithms are given for computing the kernels. The study evaluates the kernel approach experimentally, with positive results. The dissertation combines the constituent solutions to present a single coherent information extraction system and concludes that machine learning is a viable methodology for designing natural language processing applications.
Issue Date:2003
Description:116 p.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2003.
Other Identifier(s):(MiAaPQ)AAI3102009
Date Available in IDEALS:2015-09-25
Date Deposited:2003

This item appears in the following Collection(s)

Item Statistics