## Files in this item

FilesDescriptionFormat

application/pdf

Kotov_Alexander.pdf (652kB)
(no description provided)PDF

## Description

 Title: Leveraging user interaction to improve search experience with difficult and exploratory queries Author(s): Kotov, Alexander Director of Research: Zhai, ChengXiang Doctoral Committee Member(s): Han, Jiawei; Chang, Kevin C-C.; Si, Luo Department / Program: Computer Science Discipline: Computer Science Degree Granting Institution: University of Illinois at Urbana-Champaign Degree: Ph.D. Genre: Dissertation Subject(s): Information Retrieval Interactive Feedback Abstract: The query-based search paradigm is based on the assumption that the searchers are able to come up with the effective differentiator terms to make their queries specific and precise. In reality, however, a large number of queries are problematic return either too many or no relevant documents in the initial search results. Existing search systems provide no assistance to the users when they cannot formulate an effective keyword query and receive the search results of poor quality. In some cases, the users may intentionally formulate broad or exploratory queries (for example, when they want to explore a particular topic without having a clear search goal). In other cases, the users may not know the domain of the search problem sufficiently well and their queries may suffer from the problems, of which they may not be aware, such as ambiguity or vocabulary mismatch. Although the quality of search results can be improved by reformulating the queries, finding a good reformulation is often non-trivial and takes time. Therefore, in addition to the existing work on using the relevant documents from the top-ranked initially retrieved results to retrieve more relevant documents, it is important from both theoretical and practical points of view to also develop an interactive retrieval model, which would allow the search systems to improve the users' search experience with exploratory queries, which return too many relevant documents, and difficult queries, which return no relevant documents in the initial search results. In this thesis, we propose and study three methods for interactive feedback that allow the search systems to interactively improve the quality of retrieval results for difficult and exploratory queries: question feedback, sense feedback and concept feedback. All three methods are based on a novel question-guided interactive retrieval model, in which a search system collaborates with the users in achieving their search goals by generating the natural language refinement questions. The first method, \textit{question feedback} is aimed at interactive refinement of short, exploratory keyword-based queries by automatically generating a list clarification questions, which can be presented next to the standard ranked list of the retrieved documents. Clarification questions place the broad query terms into a specific context and help the user focus on and explore a particular aspect of the query topic. By clicking on a question, the users are presented with an answer to it and by clicking on the answer they can be redirected to the document containing the answer for further exploration. Therefore, clarification questions can be considered as shortcuts to specific answers. Questions also provide a more natural mechanism to elicit relevance feedback from the users. A query can be expanded by adding the terms from the clicked question and resubmitted to the search system, generating a new set of questions and documents retrieved with the expanded query. Enabling interactive question-based retrieval requires major changes to all components of the retrieval process: from more sophisticated methods of content analysis to ranking and feedback. Specifically, we propose the methods to locate and index the content, which can be used for question generation, and to generate and rank well-formed and meaningful questions in response to user queries. We implemented the prototype of a question-guided search system on a subset of Wikipedia and conducted the user studies, which demonstrated the effectiveness of the question-based feedback strategy. The second method, \textit{sense feedback}, is aimed at clarifying the intended sense of ambiguous query terms with automatically generated clarification questions in the form of \textit{Did you mean \{ambiguous query term\} as \{sense label\}?''}, where the sense label can be a single term or a phrase. Our approach to sense detection is based on the assumption that the senses of a word can be differentiated by grouping and analyzing all the contexts, in which a given word appears in the collection. We propose to detect the senses of a query term by clustering the global (based on the entire collection) graph of relationships of a query term with other terms in the collection vocabulary. We conducted simulation experiments with two graph clustering algorithms and two methods for calculating the strength of relationship between the terms in the graph to determine the upper bound for the retrieval effectiveness of sense feedback and the best method for detecting the senses. We also proposed several alternative methods to represent the discovered senses and conducted a user study to evaluate the effectiveness of each representation method with the actual retrieval performance of user sense selections. The third method, \textit{concept feedback}, utilizes ConceptNet, an on-line commonsense knowledge base and natural language processing toolkit. As opposed to ontologies and other knowledge bases, such as WordNet and Wikipedia, ConceptNet is not limited to hyponym/hypernym relations and features a more diverse relational ontology as well as a graph-based knowledge representation model, which allows to make more complex textual inferences. First, we conducted simulation experiments by expanding each query term with the related concepts from ConceptNet, which demonstrated a considerable upper bound potential of tapping into a knowledge base to overcome the problem of the lack of positive relevance signals in the initial retrieval results for difficult queries. Second, we proposed and experimentally evaluated heuristic and machine learning based methods for selecting a small number of candidate concepts for query expansion. The experimental results on multiple data sets indicate that concept feedback can effectively improve the retrieval performance of difficult queries both when used in isolation as well as in combination with pseudo-relevance feedback. Issue Date: 2012-02-01 Genre: Dissertation / Thesis URI: http://hdl.handle.net/2142/29476 Rights Information: Copyright 2011 Alexander Kotov Date Available in IDEALS: 2014-02-01 Date Deposited: 2011-12
﻿