Files in this item



application/pdfMehwish_Riaz.pdf (15MB)
(no description provided)PDF


Title:Mining novel sources of knowledge to identify causal information in text
Author(s):Riaz, Mehwish
Director of Research:Girju, Roxana
Doctoral Committee Chair(s):Girju, Roxana
Doctoral Committee Member(s):Zhai, ChengXiang; Hockenmaier, Julia C.; Di Eugenio, Barbara
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Discourse Processing
Natural Language Semantics
Abstract:The abundance of information on the internet has impacted the lives of people to a great extent. People take advantage of the internet to acquire information for several day to day social and political activities. Though the plenty of information on the internet is of great use, it takes lot of time to go through a number of text articles to understand events and the causal relations between events that build a particular social or political news story. In this thesis, we focus on the problem of automated extraction of causal information in text. This can be of great assistance to the people who strive to acquire the flow of events in text to make various decisions and predict consequences of their decisions. In natural language, causal relations can be encoded using various linguistic constructions. Each construction with its own semantics can pose various challenges for the problem of identifying causality. In this thesis, we address the tasks of identifying causality between two verbs and a verb and a noun by deeply analyzing semantics of these constructions. After the successful use of linguistic features for various Natural Language Processing (NLP) tasks, several approaches have been proposed to identify causality using such features in the framework of supervised learning. However, it is not practical to depend merely on these features because there are many factors involved in identifying causality such as background knowledge, semantic and pragmatic features of events, world knowledge, etc. In addition to the above, the supervised learning approaches are sensitive to the size of training corpus and the type of contexts of training instances. For example, the unambiguous training instances do not provide a better supervision for the ambiguous and implicit instances of semantic relations including causality [Sporleder and Lascarides 2008]. Therefore, in this work instead of merely relying on the linguistic features extracted from the contexts of training instances, we propose an approach to derive novel sources of knowledge for identifying causal information in text. In the first part of this thesis, we introduce methods to acquire background knowledge and the knowledge of causal semantics of verbs for the task of identifying causality between the two state of affairs represented by verbs. After the knowledge acquisition step, we integrate the above types of knowledge with a supervised classifier employing linguistic features to obtain optimal predictions for the current task. Similarly, in the second part of this thesis, we propose methods to acquire and employ the knowledge of causal semantics of nouns, verbs and verb frames to identify causality between the two state of affairs represented by verbs and nouns. With the addition of novel sources of knowledge, our models for the current tasks gain lots of progress in performance over the baseline of supervised classifiers relying merely on linguistics features. Moreover, in comparison with these supervised classifiers, performance of our models is more robust on all types of context - i.e., unambiguous, ambiguous and implicit contexts.
Issue Date:2014-05-30
Rights Information:2014 by Mehwish Riaz
Date Available in IDEALS:2014-05-30
Date Deposited:2014-05

This item appears in the following Collection(s)

Item Statistics