Files in this item



application/pdfTensor Space Model for Document Analysis.pdf (234kB)
(no description provided)PDF


Title:Tensor Space Model for Document Analysis
Author(s):Cai, Deng; He, Xiaofei; Han, Jiawei
Subject(s):computer science
Abstract:Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space. In such a vector space, techniques like Latent Semantic Indexing (LSI), Support Vector Machines (SVM), Naive Bayes, etc., can be then applied for indexing and classification. However, in some cases, the dimensionality of the document space might be extremely large, which makes these techniques infeasible due to the {\em curse of dimensionality}. In this paper, we propose a novel {\bf Tensor Space Model} for document analysis. We represent documents as the second order tensors, or matrices. Correspondingly, a novel indexing algorithm called {\bf Tensor Latent Semantic Indexing} (TensorLSI) is developed in the tensor space. Our theoretical analysis shows that TensorLSI is much more computationally efficient than the conventional Latent Semantic Indexing, which makes it applicable for extremely large scale data set. Several experimental results on standard document data sets demonstrate the efficiency and effectiveness of our algorithm.
Issue Date:2006-04
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2006-2715
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-21

This item appears in the following Collection(s)

Item Statistics