Files in this item



application/pdfLarge Scale Inf ... Querying Web Databases.pdf (2MB)
(no description provided)PDF


Title:Large Scale Information Integration on the Web: Finding, Understanding and Querying Web Databases
Author(s):Zhang, Zhen
Subject(s):computer science
Abstract:The Web has been rapidly ``deepened'' by myriad searchable databases online, where data are hidden behind query interfaces. Guarding data behind them, such query interfaces are the ``entrances'' or ``doors'' to the deep Web. To open this door to the deep Web, we have been building the MetaQuerier system-- for both exploring (to find) and integrating (to query) databases on the Web through their query interfaces. To find Web databases, we need to provide search functionalities that dynamically discover databases relevant to user's information needs. To query those Web databases, we need to ``understand'' what a query interface says-- i.e., what query capabilities a source supports through its interface, in terms of specifiable conditions. Further, to help users query ``alternative'' sources, we need to mediate heterogeneous query capabilities across different sources discovered on-the-fly. Finally, to process queries submitted to a database, we need to design efficient query processing techniques. To address those challenges, this thesis presents several key components in MetaQuerier system: First, a search facility searches for useful databases by their schemas; Second, form extractor extracts query capabilities of databases by applying a best-effort parsing approach based on hidden syntax; Third, form assistant translates queries across pairs of interfaces on-the-fly by deploying a light-weight, domain-based translation framework. Fourth, OPT* framework processes ranked queries by a k-constraint optimization problem. We evaluate our techniques upon real databases on the Web. The experiment results show the promise of our system.
Issue Date:2006-12
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2006-2795
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-21

This item appears in the following Collection(s)

Item Statistics