Files in this item



application/pdfQIU-THESIS-2021.pdf (1MB)
(no description provided)PDF


Title:Mining social media stimulus from news article text using weakly-supervised narrative classification
Author(s):Qiu, Wenda
Advisor(s):Han, Jiawei
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Text Mining
News Classification
Abstract:To make an accurate simulation for social media, we first need to find the stimulus in external sources. In this work, we model the stimulus mining into a narrative classification task on a news article dataset. The previous state-of-the-art text classification methods can not be directly applied here, mainly due to the following challenges we need to solve: 1) Lack of training data: the given news article data does not have labeling for narratives and we can not afford manual labeling other than a small evaluation set. 2) The complexity in narratives: narratives are defined in a more complex way comparing to the classes used in a classical news classification dataset, which stops us from using existing weakly supervised text classification methods that heavily depend on class name semantics. 3) The noisy news article dataset: the collected dataset does not guarantee the documents will belong to any of the narratives. In such cases, the power of the self-training strategy widely used in existing methods on weak supervision will be limited. To solve these challenges, we proposed a narrative decomposition and re-grouping strategy and a relevance filtering module, to fully utilize the power of weakly supervised classification methods. We conduct extensive experiments on two datasets under the background of real global events and further proposed two ways to combine different results for an optimal stimulus time-series.
Issue Date:2021-04-27
Rights Information:Copyright 2021 Wenda Qiu
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05

This item appears in the following Collection(s)

Item Statistics