Files in this item



application/pdfContextual Inde ... Scalable Entity Search.pdf (309kB)
(no description provided)PDF


Title:Contextual Indexing and Joining: Supporting Efficient, Scalable Entity Search
Author(s):Cheng, Tao; Chang, Kevin Chen-Chuan
Subject(s):computer science
Abstract:As the Web has evolved into an entity abundant repository, with the standard ``page view'', current search engines are becoming increasingly inadequate for a wide range of query tasks. Entity search, a significant departure from document retrieval, finds fine granularity information, i.e, entities, embedded in documents directly and holistically across the whole collection. Essentially, entity search is to find matching entities by context patterns from each document and to aggregate them across documents for ranking. This text-based pattern matching suggests that standard inverted lists-based query processing can be applied. However, this baseline is limited in both efficiency, due to long entity lists, and scalability, due to cross-document aggregation. To enhance efficiency, we propose ``contextual index'', an index that materializes pre-joins, to eliminate unnecessary index reading and reduce online matching. To improve scalability, we propose ``entity-space'' partitioning, so that answer subspaces can be aggregated locally. We reason our design rationale from both the functional and the operational definition of entity search, and show that they consistently reach our framework. We evaluate the indexing (contextual indexing) and parallel query processing (contextual joining) framework over a 2TB real Web corpus with systematic benchmark query sets. Experiments show that our scheme can speed up query processing by, in average, two order of magnitude over the baseline.
Issue Date:2007-10
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2007-2911
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-22

This item appears in the following Collection(s)

Item Statistics