Files in this item



application/pdfWANG-THESIS-2016.pdf (741kB)Restricted Access
(no description provided)PDF


Title:TextDive: construction, summarization and exploration of multi-dimensional text corpora
Author(s):Wang, Qi
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):multi-dimensional text corpora analysis
text cube analysis
text summarization
Abstract:With massive datasets accumulating in text repositories (e.g., news articles, customer reviews, etc.), it is highly desirable to systematically utilize and explore them by data mining, NLP and database techniques. In our view, documents in text corpora contain informative explicit meta-attributes (e.g., category, date, author, etc.) and implicit attributes (e.g., sentiment), forming one or a set of highly-structured multi-dimensional spaces. Much knowledge can be derived if we develop effective and efficient multi-dimensional summarization, exploration and analysis technologies. In this demo, we propose an end-to-end, real-time analytical platform TextDive for processing massive text data, and provide valuable insights to general data consumers. First, we develop a set of information extraction, entity typing and text mining methods to extract consolidated dimensions and automatically construct multi-dimensional textual spaces (i.e., text cubes). Furthermore, we develop a set of OLAP-like text summarization, data exploration and text analysis mechanisms that understand semantics of text corpora in multi-dimensional spaces. We also develop an efficient computational solution that involves materializing selective statistics to guarantee the interactive and real-time nature of TextDive.
Issue Date:2016-04-20
Rights Information:Copyright 2016 Qi Wang
Date Available in IDEALS:2016-07-07
Date Deposited:2016-05

This item appears in the following Collection(s)

Item Statistics