Files in this item



application/pdf107_ready.pdf (233kB)
(no description provided)PDF


Title:Targeted Query Expansions as a Method for Searching Mixed Quality Digitized Cultural Heritage Documents
Author(s):Keskustalo, Heikki; Kettunen, Kimmo; Kumpulainen, Sanna; Ferro, Nicola; Silvello, Gianmaria; Järvelin, Anni; Kekäläinen, Jaana; Arvola, Paavo; Sormunen, Eero; Järvelin, Kalervo; Saastamoinen, Miamaria
Subject(s):cultural institutions
information seeking/retrieval
archives and records
Abstract:Digitization of cultural heritage is a huge ongoing effort in many countries. In digitized historical documents, words may occur in different surface forms due to three types of variation - morphological variation, historical variation, and errors in optical character recognition (OCR). Because individual documents may differ significantly from each other regarding the level of such variations, digitized collections may contain documents of mixed quality. Such different types of documents may require different types of retrieval methods. We suggest using targeted query expansions (QE) to access documents in mixed-quality text collections. In QE the user-given search term is replaced by a set of expansion keys (search words); in targeted QE the selection of expansion terms is based on the type of surface level variation occurring in the particular text searched. We illustrate our approach in a highly inflectional compounding language, Finnish while the variation occur across all natural languages. We report a minimal-scale experiment based on the QE method and discuss the need to support targeted QEs in the search interface.
Issue Date:2015-03-15
Series/Report:iConference 2015 Proceedings
Genre:Conference Paper / Presentation
Peer Reviewed:yes
Rights Information:Copyright 2015 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors.
Date Available in IDEALS:2015-03-23

This item appears in the following Collection(s)

Item Statistics