Files in this item

FilesDescriptionFormat

application/pdf

application/pdfAL-SABBAGH-DISSERTATION-2015.pdf (636kB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:A unified framework to identify and extract uncertainty cues, holders, and scopes in one fell-swoop
Author(s):Al-Sabbagh, Rania Mostafa
Director of Research:Girju, Roxana; Diesner, Jana
Doctoral Committee Chair(s):Girju, Roxana
Doctoral Committee Member(s):Benmamoun, Elabbas; Hockenmaier , John
Department / Program:Linguistics
Discipline:Linguistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Computational Semantics
Semitic Languages
Uncertainty
Social Media Analysis
Abstract:Uncertainty refers to the language aspects that express hypotheses and speculations where propositions are held as (un)certain, (im)probable, or (im)possible. Automatic uncertainty analysis is crucial for several Natural Language Processing (NLP) applications that need to distinguish between factual (i.e. certain) and nonfactual (i.e. negated or uncertain) information. Typically, a comprehensive automatic uncertainty analyzer has three machine learning models for uncertainty detection, attribution, and scope extraction. To-date, and to the best of my knowledge, current research on uncertainty automatic analysis has only focused on uncertainty attribution and scope extraction, and has typically tackled each task with a different machine learning approach. Furthermore, current research on uncertainty automatic analysis has been restricted to specific languages, particularly English, and to specific linguistic genres, including biomedical and newswire texts, Wikipedia articles, and product reviews. In this research project, I attempt to address the aforementioned limitations of current research on automatic uncertainty analysis. First, I develop a machine learning model for uncertainty attribution, the task typically neglected in automatic uncertainty analysis. Second, I propose a unified framework to identify and extract uncertainty cues, holders, and scopes in one-fell swoop by casting each task as a supervised token sequence labeling problem. Third, I choose to work on the Arabic language, in contrast to English, the most commonly studied language in the literature of automatic uncertainty analysis. Finally, I work on the understudied linguistic genre of tweets. This research project results in a novel NLP tool, i.e., a comprehensive automatic uncertainty analyzer for Arabic tweets, with a practical impact on NLP applications that rely on uncertainty automatic analysis. The tool yields an F1 score of 0.759, averaged across its three machine learning models. Furthermore, through this research, the research community and I gain insights into (1) the challenges presented by Arabic as an agglutinative morphologically-rich language with a flexible word order, in contrast to English; (2) the challenges of the linguistic genre of tweets for uncertainty automatic analysis; and (3) the type of challenges that my proposed unified framework successfully addresses and boosts performance for.
Issue Date:2015-04-17
Type:Thesis
URI:http://hdl.handle.net/2142/78621
Rights Information:Copyright 2015 Rania Mostafa Al-Sabbagh
Date Available in IDEALS:2015-07-22
Date Deposited:May 2015


This item appears in the following Collection(s)

Item Statistics