Files in this item



application/pdfMORALES-DISSERTATION-2020.pdf (6MB)
(no description provided)PDF


Title:Model-based feature construction and text representation for social media analysis
Author(s):Morales, Alex
Director of Research:Zhai, ChengXiang
Doctoral Committee Chair(s):Zhai, ChengXiang
Doctoral Committee Member(s):Han, Jiawei; Hockenmaier, Julia; Ungar, Lyle
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):machine learning
feature construction
feature development
truth discovery
Abstract:Text representation is at the foundation of most text-based applications. Surface features are insufficient for many tasks and therefore constructing powerful discriminative features in a general way is an open challenge. Current approaches use deep neural networks to bypass feature construction. While deep learning can learn sophisticated representations from the text, it requires a lot of training data, which might not be readily available, and the derived features are not necessarily interpretable. In this work, we explore a novel paradigm, model-based feature construction (MBFC), that allows us to construct semantic features that can potentially improve many applications. In brief, MBFC uses human knowledge and expertise as well as big data to guide the design of models that enhance predictive modeling and support the data mining process by extracting useful knowledge, which in turn can be used as features for downstream prediction tasks. In this dissertation, we show how this paradigm can be applied to several tasks of social media analysis. We explore how MBFC can be used to solve the problem of target misalignment for prediction, where the output variable and the data may be at different levels of resolution and the goal is to construct features that can bridge this gap. The MBFC method allows us to use additional related data, e.g. associated context, to facilitate semantic analysis and feature construction. In this dissertation, we focus on a subset of problems in which social media data, in particular text data, can be leveraged to construct useful representations for prediction. We explore several kinds of user-generated content in social media data such as review data for useful review prediction, micro-blogging data for urgent health-based prediction tasks, and discussion forum data for expert prediction. First, we propose a background mixture model to capture incongruity features in text and use these features for humor detection in restaurant reviews. Second, we propose a source reliability feature representation method for trustworthy comment identification that incorporates user aspect expertise when modeling fine-grained reliabilities in an online discussion forum. And finally, we propose multi-view attribute features that adapt MBFC to handle the target misalignment problem for topic-based features and apply this to tweets in order to forecast new diagnosis rates for sexually transmitted infections.
Issue Date:2020-12-01
Rights Information:Copyright 2020 Alex Morales
Date Available in IDEALS:2021-03-05
Date Deposited:2020-12

This item appears in the following Collection(s)

Item Statistics