Files in this item



application/pdfConceptual Search and Text Categorization.pdf (226kB)
(no description provided)PDF


Title:Conceptual Search and Text Categorization
Author(s):Ratinov, Lev; Roth, Dan; Srikumar, Vivek
Subject(s):machine learning
natural language processing
Abstract:The most fundamental problem in information retrieval is that of interpreting information needs of users, typically expressed in a short query. Using the surface level representation of the query is especially unsatisfactory when the information needs are topic specific such as ``US politics'' or ``Space Science'', that seem to require understanding of what the query mean rather than what it is. We suggest that a newly proposed semantic representation of Words (GabrilovichMa2007) can be used to support Conceptual Search. Namely, it allows retrieving documents on a given topic even when existing keyword-based search approaches fail. The method we develop allows us to categorize and retrieve documents topically on-the-fly, without looking at the data collection ahead of time, without knowing a-priori the topics of interest and without training topic categorization classifiers. We compare our approach experimentally to state-of-the-art IR techniques and to machine learning based text categorization techniques and demonstrate significant improvement in performance. Moreover, as we show, our method is intrinsically adaptable to new text collections and domains.
Issue Date:2008-01
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2008-2932
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-22

This item appears in the following Collection(s)

Item Statistics