Files in this item



application/pdfLearning and In ... Information Extraction.pdf (495kB)
(no description provided)PDF


Title:Learning and Inference for Information Extraction
Author(s):Yih, Wen-tau
Subject(s):Natual Language Proecessing Information Extraction Machine Learning
Abstract:Information extraction is a process that extracts limited semantic concepts from text documents and presents them in an organized way. Unlike several other natural language tasks, information extraction has a direct impact on end-user applications. Despite its importance, information extraction is still a difficult task due to the inherent complexity and ambiguity of human languages. Moreover, mutual dependencies between local predictions of the target concepts further increase difficulty of the task. In order to enhance information extraction technologies, we develop general approaches for two aspects -- relational feature generation and global inference with classifiers. It has been quite convincingly argued that relational learning is suitable in training a complicated natural language system. We propose a relational feature generation approach that facilitates relational learning through propositional learning algorithms. In particular, we develop a relational representation language to produce features in a data driven way. The resulting features capture the relational structures of a given domain, and therefore allow the learning algorithms to effectively learn the relational definitions of target concepts. Although the learned classifier can be used to directly predict the target concepts, conflicts between the labels of different target variables often occur due to imperfect classifiers. We propose an inference framework to correct mistakes of the local predictions by using the predictions and task-dependent constraints to produce the best global assignment. This inference framework can be modeled by a Bayesian network or integer linear programming. The proposed learning and inference frameworks have been applied to a variety of information extraction tasks, including entity extraction, entity/relation recognition, and semantic role labeling.
Issue Date:2005-05
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2005-2534
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-17

This item appears in the following Collection(s)

Item Statistics