Files in this item



application/pdf3290314.pdf (4MB)Restricted to U of Illinois
(no description provided)PDF


Title:Efficient Data Integration: Automation, Collaboration, and Relaxation
Author(s):McCann, Robert Lee
Doctoral Committee Chair(s):Doan, AnHai
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Computer Science
Abstract:While the previous two directions reduce integration costs by improving the performance of automatic tools (either by improvements to the tool itself, or by leveraging users to boost tool accuracy), the last direction explored in this thesis attacks data integration costs at their foundation---rigidity. The current data integration system model imposes a very rigid structure on its components and the data that is passed between components. For example, wrappers are responsible for extracting precise structured data, allowing traditional structured query processing techniques to compute the query result. However, my third direction explores our ability to relax these assumptions, thereby allowing us to answer queries without suffering unnecessary costs required in the traditional model (e.g., building full-fledged wrappers). In this thesis I investigate this idea within the context of supporting one-time, on-the-fly queries over distributed Web data. I develop and evaluate SLIC, a system that allows a user to quickly pose SQL queries over multiple sources (after only some minimal preprocessing), obtain initial results, then iterate with the system to get increasingly better results. The fundamental idea is to learn only as much structure as necessary to answer a given query. Extensive experiments on real-world domains show that for many practical queries SLIC is significantly faster than current methods, thus providing a promising first step toward a principled solution for lazy, on-the-fly integration of Web data, and hopefully sparking interest in our potential to remove some of the fundamental costs inherent in the traditional integration system model.
Issue Date:2007
Description:121 p.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2007.
Other Identifier(s):(MiAaPQ)AAI3290314
Date Available in IDEALS:2015-09-25
Date Deposited:2007

This item appears in the following Collection(s)

Item Statistics