Files in this item

FilesDescriptionFormat

application/pdf

application/pdf2_Hauguel_Samson.pdf (949kB)
(no description provided)PDF

Description

Title:Discovery driven analysis on semi-structured text data
Author(s):Hauguel, Samson A.
Advisor(s):Zhai, ChengXiang
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):computer science
Information Science
Data Mining
Text Mining
Online analytical processing (OLAP)
discovery driven analysis
Probabilistic latent semantic analysis (PLSA)
Abstract:Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Issue Date:2010-05-19
URI:http://hdl.handle.net/2142/16180
Rights Information:Copyright 2010 Samson A. Hauguel
Date Available in IDEALS:2010-05-19
Date Deposited:May 2010


This item appears in the following Collection(s)

Item Statistics