Files in this item



application/pdfKARMAKERSANTU-DISSERTATION-2019.pdf (5MB)
(no description provided)PDF


Title:Influence mining from unstructured big data
Author(s):Karmaker Santu, Shubhra Kanti
Director of Research:Zhai, ChengXiang
Doctoral Committee Chair(s):Zhai, ChengXiang
Doctoral Committee Member(s):Han, Jiawei; Sundaram, Hari; Ma, Hao
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Influence Mining
Hawkes Process
User Behavior Modeling
Text Generation
Community Influence
Evolving Text Stream
Unstructured Data
Event Analysis
Abstract:A crucial component of any intelligent system is to understand and predict the behavior of its users. A correct model of user's behavior enables the system to perform effectively to better serve the user's need. While much work has been done on user behavior modeling based on historical activity data, little attention has been paid to how external factors influence the user behavior, which is clearly important for improving an intelligent system. The influence of external factors on user behavior is mostly reflected in two different ways: 1) through significant growth of users' thirst about information related to external factors (e.g., the user may conduct many searches related to a popular event or related to some community of interest), and 2) through user-generated content that are directly/indirectly related to the external factors (e.g. the user may tweet about a particular event). To capture these two aspects of user behavior, I introduce Influence Models for both Information Thirst and Content Generation, sequentially, in this thesis. To the best of my knowledge, Influence models for Information Thirst and Content Generation have not been studied before. The thesis starts with the introduction of a new data mining problem, i.e., how to mine the influence of real world events on users' information thirst, which is important both for social science research and for designing better search engines for users. I solve this mining problem by proposing computational measures that quantify the influence of an event on a query to identify triggered queries and then, proposing a novel extension of Hawkes process to model the evolutionary trend of the influence of an event on search queries. Evaluation results using news articles and search log data show that the proposed approach is effective for identification of queries triggered by events reported in news articles and characterization of the influence trend over time. This influence model assumes that each event poses its influence independently. This assumption is unrealistic as there are many correlated events in the real world which influence each other and thus, would influence the user search behavior jointly rather than independently. To relax this assumption, in the next part of my thesis, I propose a Joint Influence Model based on the Multivariate Hawkes Process which captures the interdependence among multiple events in terms of their influence. Experimental study shows that the Joint Influence Model achieves higher accuracy than the independent model. The second way to observe external influence on user behavior is to analyze user-generated content that is directly/indirectly related to those external factors, which I discuss in the last part of the thesis. For example, user-generated content is often significantly influenced by the community to which the user belongs to. While some work has been done on mining such influence from structured information networks, little attention has been paid on how to mine community-influence from user generated unstructured data. To study such influence, I introduce the problem of mining community-influence from user-generated unstructured contents, particularly in the context of text content generation. Although text generation has recently become a popular research topic after the surge of deep learning techniques, existing methods do not consider community-influence factor into the generation process and thus, the processes do not evolve over time. This clearly limits their application on text stream data as most text stream data often evolve over time showing distinct patterns corresponding to the shifting interests of the target community. To address this limitation, I propose an Influenced Text Generation (ITG) Process that can capture this evolution of text generation process corresponding to evolving community-influence over time. ITG is based on deep learning architecture and uses LSTM cells within the hidden layers of a recurrent neural network. Experimental results with six independent text stream data comprised of conference paper titles show that the proposed ITG method is really effective in capturing the influences of different research communities on paper titles generated by the researchers.
Issue Date:2019-04-17
Rights Information:Copyright 2019, Shubhra Kanti Karmaker Santu
Date Available in IDEALS:2019-08-23
Date Deposited:2019-05

This item appears in the following Collection(s)

Item Statistics