Files in this item

FilesDescriptionFormat

application/pdf

application/pdf3270067.pdf (5MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:Large Scale Information Integration on the Web: Finding, Understanding and Querying Web Databases
Author(s):Zhang, Zhen
Doctoral Committee Chair(s):Chang, Kevin C.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Computer Science
Abstract:The Web has been rapidly "deepened" by myriad searchable databases online, where data are hidden behind query interfaces. Guarding data behind there, such query interfaces are the "entrances" or "doors" to the deep Web. To open this door to the deep Web, we have been building the MetaQuerier system---for both exploring (to find) and integrating (to query) databases on the Web through their query interfaces. To find Web databases, we need to provide search functionalities that dynamically discover databases relevant to user's information needs. To query those Web databases, we need to "understand" what a query interface says---i.e., what query capabilities a source supports through its interface, in terms of specifiable conditions. Further, to help users query "alternative" sources, we need to mediate heterogeneous query capabilities across different sources discovered on-the-fly. Finally, to process queries submitted to a database, we need to design efficient query processing techniques. To address those challenges, this thesis presents several key components in MetaQuerier system: First, a search facility searches for useful databases by their schemas; Second, form extractor extracts query capabilities of databases by applying a best-effort parsing approach based on hidden syntax; Third, form assistant translates queries across pairs of interfaces on-the-fly by deploying a light-weight, domain-based translation framework. Fourth, OPT* framework processes ranked queries by a k constraint optimization problem. We evaluate our techniques upon real databases on the Web. The experiment results show the promise of our system.
Issue Date:2007
Type:Text
Language:English
Description:148 p.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2007.
URI:http://hdl.handle.net/2142/81771
Other Identifier(s):(MiAaPQ)AAI3270067
Date Available in IDEALS:2015-09-25
Date Deposited:2007


This item appears in the following Collection(s)

Item Statistics