|Abstract:||With the birth of Web, the amount of information grows rapidly. Such a huge amount of information poses significant challenges in text information management. Search engines are by far the most powerful tools that help users find information. The accuracy of search engines significantly affects our productivity and our quality of life. Text retrieval is the underlying research problem behind all the search engines. An improved test retrieval model enables every search engine to achieve higher search accuracy.
The thesis presents a novel axiomatic framework to study and develop more robust and effective text retrieval models. The current retrieval models all model relevance indirectly, which prevents us from understanding what makes a retrieval function perform well. As a result, we have to rely on heavy parameter tuning to optimize the retrieval performance. To overcome this limitation, the proposed axiomatic framework models the relevance directly with a set of retrieval constraints (i.e., axioms). Our approach is motivated by the empirical observation that good retrieval performance is closely related to the use of various retrieval heuristics. We formalize these retrieval heuristics as constraints, and use them as guidance on diagnosing the weaknesses and strengths of a retrieval function and developing more robust and effective retrieval functions in a principled way. Experiments show three major benefits of the proposed axiomatic approach. First, it allows us to diagnose the weaknesses and strengths of retrieval functions both analytically and empirically. The performance of retrieval functions can be improved based on the diagnostic results. Second, the axiomatic approach makes it possible to derive more robust and effective retrieval functions. The derived new retrieval functions are more robust and less sensitive to parameter settings than the existing retrieval functions with comparable optimal performance. Third, the axiomatic approach provides an easy way to incorporate additional information, such as semantic term matching, to further improve the retrieval performance.
The axiomatic framework opens up many promising new directions for studying and developing more robust and effective retrieval functions. Since relevance is directly modeled through retrieval constraints, the framework enables us to understand relevance theoretically and predict the performance of a retrieval function analytically. The framework facilitates diagnostic analysis of retrieval functions, thus can provide guidances on how to eventually develop the ultimate optimal retrieval function.