Files in this item



application/pdfSehrawat_Nipun.pdf (826kB)
(no description provided)PDF


Title:A study of the impact of global statistics in distributed information retrieval
Author(s):Sehrawat, Nipun
Advisor(s):Zhai, ChengXiang
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Distributed Information Retrieval
Global Statistics
Retrieval Performance
Abstract:Today’s information retrieval systems have to deal with very large data collections and take a distributed approach to achieve scalable retrieval performance. The most widely used approach, called document-partitioning, is to partition the data among multiple search-nodes, which then index their sub-collection independently and are responsible for scoring documents present in their index, against queries. Most of the famous document scoring functions depend on various global (collection-wide) statistics such as document frequency of terms. However, as search-nodes don’t have access to global-statistics and rely on local (sub-collection-wide) statistics for the purpose of scoring, document-partitioning can result in a degraded retrieval performance. In this thesis, we study the impact of the lack of global-statistics on the retrieval performance of a distributed information retrieval (DIR) system. Our experiments show that the performance, as indicated by multiple measures, degrades as the number of search-nodes are increased. We thus conclude that global-statistics are essential to the retrieval performance in a distributed setup. Finally, we present a novel scheme for lazy and adaptive dissemination of global-statistics in a document-partitioned DIR system.
Issue Date:2012-06-27
Rights Information:Copyright 2012 Nipun Sehrawat
Date Available in IDEALS:2014-06-28
Date Deposited:2012-05

This item appears in the following Collection(s)

Item Statistics