Files in this item



application/pdf56.4.saracevic.pdf (157kB)


Title:Effects of Inconsistent Relevance Judgments on Information Retrieval Test Results: A Historical Perspective
Author(s):Saracevic, Tefko
Subject(s):Lancaster, F. Wilfrid (Frederick Wilfrid), 1933-
Abstract:Abstract The main objective of information retrieval (IR) systems is to retrieve information or information objects relevant to user requests and possible needs. In IR tests, retrieval effectiveness is established by comparing IR systems retrievals (systems relevance) with users’ or user surrogates’ assessments (user relevance), where user relevance is treated as the gold standard for performance evaluation. Relevance is a human notion, and establishing relevance by humans is fraught with a number of problems—inconsistency in judgment being one of them. The aim of this critical review is to explore the relationship between relevance on the one hand and testing of IR systems and procedures on the other. Critics of IR tests raised the issue of validity of the IR tests because they were based on relevance judgments that are inconsistent. This review traces and synthesizes experimental studies dealing with (1) inconsistency of relevance judgments by people, (2) effects of such inconsistency on results of IR tests and (3) reasons for retrieval failures. A historical context for these studies and for IR testing is provided including an assessment of Lancaster’s (1969) evaluation of MEDLARS and its unique place in the history of IR evaluation.
Issue Date:2008
Publisher:Johns Hopkins University Press and the Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Citation Info:In Library Trends 56 (4) Winter 2008: 763-783
Publication Status:published or submitted for publication
Rights Information:Copyright 2008 Board of Trustees of the University of Illinois
Date Available in IDEALS:2009-01-22

This item appears in the following Collection(s)

Item Statistics