Files in this item

FilesDescriptionFormat

application/pdf

application/pdfHwang_Hsiang-Yeh.pdf (3MB)
(no description provided)PDF

Description

Title:Mining informative sentences in multi-product documents with PLSA
Author(s):Hwang, Hsiang-Yeh
Advisor(s):Zhai, ChengXiang
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):multi-product document
single-product document
informative sentences
informative words (ie. keywords)
product name entity
Abstract:In this thesis, we study the problem of fetching informative sentences from multi-product documents. A multi-product document is defined as a document that mentions about multiple products, which is often for comparison purpose. An informative sentence is defined as a sentence that provides the characteristics of the product. we propose to use Probabilistic Latent Semantic Analysis (PLSA) to mine informative sentences given a single multi-product document. By applying PLSA in a multi-product document, it simultaneously solves three problems regarding to multi-product document: 1. Separate the document based on the products. 2. Fetch the informative key words for each product. 3. Fetch the informative sentences. The proposed method can mine the product information from a single multi-product document, which is quite different from previous works in opinion mining. Experiment results show that the high probability words in the word distribution of each product do discover the characteristics of the product. The results also reveal that the sentences which contain more these keywords are more informative. Practical applications of this method are: 1. Product comparison. 2. Facilitate user in reading multi-product document, which is to apply data mining in human computer interaction (HCI).
Issue Date:2010-08-20
URI:http://hdl.handle.net/2142/16897
Rights Information:Copyright 2010 Hsiang-Yeh Hwang
Date Available in IDEALS:2010-08-20
Date Deposited:2010-08


This item appears in the following Collection(s)

Item Statistics