Files in this item



application/pdfMOHITVYAS-THESIS-2020.pdf (731kB)
(no description provided)PDF


Title:Index design for information retrieval applications using database concepts
Author(s):Mohit Vyas, -
Advisor(s):Chang, Kevin
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Information Retrieval
Index Design
Entity Search
Abstract:Integrating database (DB) and information retrieval (IR) technologies has long been an important problem. Recent growth in tagged textual data, which is a combination of structured knowledge bases and unstructured text data, has made principled index design for IR applications even more desirable. In this work, we propose a framework for designing IR applications using DB concepts. We model an IR application as a specialized DB system that is required to support only a fixed predefined query workload rather than general SQL queries. The physical design of an information retrieval system is optimized for a prespecified target query workload. We then identify IR indexes as basic building blocks of the design of an IR application. An IR index is formalized as a look-up function built over a materialized view. The overall ”index design problem” is then to find an index design with lowest cost from a space of candidate designs. Since the space of candidate designs can be doubly exponential in the size of query workload, we give polynomial time heuristic algorithm with provable guarantees for a weighted set cover based relaxation of the original problem. This is then combined with an overarching branch-and-bound based optimality preserving state-space search algorithm that efficiently prunes the state-space by using the above heuristic algorithm. Finally, we show via experiments how the proposed framework can be used to obtain index designs for different kinds IR applications in practice.
Issue Date:2020-05-15
Rights Information:Copyright 2020 - Mohit Vyas
Date Available in IDEALS:2020-08-26
Date Deposited:2020-05

This item appears in the following Collection(s)

Item Statistics