|Abstract:||The explosion of internet usage has provided users with access to information in an unprecedented scale-- The data retrieval problem of finding relevant data has thus become a clear challenge. Such retrieval, with the large scale of data, has naturally demanded ranked answers, or ``best first,'' to enable users to focus on a few top results.
This thesis presents techniques to support this ranked data retrieval efficiently and effectively First, efficient processing: As data retrieval scenarios naturally situate in large dataset, such retrieval should be amenable to efficient processing. Second, intuitive formulation: Data retrieval should be intuitive for ordinary users to easily express their ranking criteria.
Toward this goal, we first study how to support efficient rank processing. Existing rank processing algorithms are all designed with specific access scenario in mind and follow statically designed behaviors. As a result, it is often unclear how to support data retrieval scenarios involving sources with heterogeneous access capabilities and dynamically changing costs. In contrast, we develop a systematic ``cost-based" optimization-- By dynamic search over a space of algorithms, cost-based optimization is general across a wide range of access scenarios, yet adaptive to any specific scenario at runtime. We develop this cost-based optimization framework, first for an important and fundamental special case of ``expensive predicates", and then extend for any arbitrary scenarios.
Second, we study how to support intuitive rank formulation. While existing rank processing algorithms require a global ranking function F, it is far from trivial for everyday user to articulate such a function F evaluating each and every object into a numerical score. We thus develop an interactive ``rank-by-examples'' paradigm to support intuitive rank formulation. In particular, we apply a machine learning approach to infer the underlying ranking function from a few given examples.
While building this system, we also observe that ranking functions learned often turn out to be quite complicated. Meanwhile, rank processing techniques have been focusing on supporting monotonic functions. We address this new challenge by developing a fresh new perspective abstracting query answering as an optimization problem, which enables to extend rank processing for any arbitrary functions.