Files in this item

FilesDescriptionFormat

application/pdf

application/pdfLI-DISSERTATION-2015.pdf (10MB)
(no description provided)PDF

Description

Title:Automatic discovery of complex causality
Author(s):Li, Chen
Director of Research:Girju, Corina R
Doctoral Committee Chair(s):Girju, Corina R
Doctoral Committee Member(s):Hasegawa-Johnson, Mark A.; Lasersohn, Peter N.; Shih, Chilin
Department / Program:Linguistics
Discipline:Linguistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Computational Linguistics
Natural Language Processing (NLP)
automata theory
formal semantics
data mining
causality
social networks
social media
Hidden Markov Model (HMM)
genetic algorithm
data prediction
big data
Abstract:This study entails the understanding of and the development of a computational method for automatically extracting complex expressions in language that correspond to event to event sequential relations in the real world. We here develop component procedures of a system that would be capable of taking raw linguistic input (such as those from narrative writings or social network data), and find real-world semantic relations among events. Such an endeavor is applicable to many types of sequential relations, for which we use causality as a case study, both for its importance as a prominent type of sequential relation between events, as well as for its general prevalence in natural language. But we also demonstrate that the idea is also applicable in principle to other major types of event to event relations, such as reciprocity. The study primarily focuses on those types of causalities that contain complex structures and require in-depth linguistic analyses to discover and extract. Designing an automated method for the extraction of structurally complex causal expressions entails methodologies and theories that are beyond conventional methods used in computational semantics. The classes of adjunctive causal structure, and embedded causal structure are types that are hard to access using traditional methods, but more amenable for methods developed in this study. The principal procedures employed for the extraction of these are a heavily mod- ified form of Hidden Markov Model (HMM), which we use to deal with causal structures that have sequentially complex makeup. We also designed a highly modified Genetic Algo- rithm (GA) adapted for embedded context-free structures, used to rank and extract those causal structures that have deep embedding at the syntax-semantics interface. These will be reformulated, augmented, and explored in depth. With these methods using unsupervised and semi-supervised learning, we were able to obtain reasonable results in terms of discrimination of causal pairs ⟨ei,ej⟩ pairs and some longer chains of causation from corpora. From these results, we were also able to perform additional linguistic analysis over their theoretical semantic structure, and observe aspects of each that allows us to sub-classify the relations according to standard ideas in formal logic as well as from behavioral psychology. These methods would be critical to a system for building a graph theoretic representation of a social network, from corpora produced by entities within that network, which would utilize the methods described in this project, and similar approaches can be extended to model and discover other types of complex event- relations. These types of fundamental technologies, would in turn, help us to design and build the types of on-line and mobile services that provide increased machine awareness of user behavior and to be able to target and cater to users individually.
Issue Date:2015-07-16
Type:Thesis
URI:http://hdl.handle.net/2142/88057
Rights Information:Copyright 2015 Chen Li
Date Available in IDEALS:2015-09-29
Date Deposited:August 201


This item appears in the following Collection(s)

Item Statistics