Files in this item



application/pdfKim_Kyoung-Young.pdf (1MB)
(no description provided)PDF


Title:Opinion Topic, Holder and Polarity in texts: exploration and automatic identification from cross-lingual data
Author(s):Kim, Kyoung-Young
Director of Research:Sproat, Richard W.
Doctoral Committee Chair(s):Girju, Roxana
Doctoral Committee Member(s):Sproat, Richard W.; Lasersohn, Peter N.; Zhai, ChengXiang
Department / Program:Linguistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Opinion mining
Sentiment analysis
English and Korean
Opinion extraction
Abstract:People express their opinions in various ways in different domains. With the growing interest in what other people think, mining opinions in texts has been the focus of attention for researchers in many different fields. Also, with the rapid development of technology and the internet, more and more multilingual and multicultural information has become available on the web. The objective of the present dissertation is exploring and automatically extracting opinions from multilingual corpora. In pursuing this objective, a bilingual opinion-annotated corpus was constructed focusing on detailed opinion factors with editorial texts. Annotated opinion factors include the holder of an opinion (Holder) and the topic of an opinion with its polarity (Positive Topic, Negative Topic). Factors used to express opinions as well as opinions across languages were investigated with the annotated corpus. The main contribution of this dissertation is the proposal of a multilingual sentiment analysis system for identifying opinion factors using a novel method that explores the linguistic structures used to express opinions. Without using pre-labeled opinion words, this multilingual sentiment analysis system directly identifies opinion factors using syntactic analysis, predicate-argument structure and pragmatic analysis. In the place of pre-labeled opinion words for each language, a clustered lexicon was constructed from bilingual dictionaries. Lexical features crucial for identifying the polarity were learned automatically. In addition to the lexical features, syntactic, morphological and contextual features were used in the learning algorithm. The syntactic structure of the sentence as well as predicate-argument structures extracted from the Propbank database were investigated and used to assign appropriate features to the target chunk. The experimental results show that the proposed system is significantly more successful than a baseline system. Experiments focusing on each novel method verify that both the clustered lexical dictionary and incorporating more linguistic structures benefit the accuracy of opinion factor extraction. The proposed system was also tested with an existing English monolingual corpus (MPQA corpus) composed of news articles, and yielded consistent results with the annotated corpus. With the experimental set-up of multilingual analysis, the way that opinions are expressed across languages was investigated and utilized to improve the results of the analysis. Experiments with cross-lingual features extracted from parallel sentences show even more improved results, which suggests cross-lingual reinforcement in identifying opinion factors with the proposed system.
Issue Date:2011-05-25
Rights Information:Copyright 2011 Kyoung-Young Kim
Date Available in IDEALS:2011-05-25
Date Deposited:2011-05

This item appears in the following Collection(s)

Item Statistics