Files in this item



application/pdf3242865.pdf (4MB)Restricted to U of Illinois
(no description provided)PDF


Title:A Holistic Paradigm for Large Scale Schema Matching
Author(s):He, Bin
Doctoral Committee Chair(s):Chang, Kevin Chen-Chuan
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Computer Science
Abstract:Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise attribute correspondences in isolation. In contrast, this thesis proposes a new matching paradigm, holistic schema matching, to match many schemas at the same time and find all matchings at once. By handling a set of schemas together, we can explore their context information that reflects the semantic correspondences among attributes. Such information is not available when schemas are matched only in pairs. As the realizations of holistic schema matching, we develop two approaches in sequence. To begin with, we develop the MGS framework, which finds simple 1:1 matchings by viewing schema matching as hidden model discovery. Then, to deal with complex matchings, we further develop the DCM framework by abstracting schema matching as correlation mining. Further, to automate the entire matching process, we incorporate the DCM framework with automatically extracted interfaces and find that the inevitable errors in automatic interface extraction may significantly affect the matching result. To make the DCM framework robust against such "noisy" schemas, we propose to integrate it with an ensemble approach by randomizing the schema data into multiple DCM matchers and aggregating their ranked results by taking majority voting. Last, as our matching algorithms require large-scale schemas in the same domain (e.g., Books and Airfares) as input, we develop an object-focused crawler for effectively collecting query interfaces and a model-differentiation based clustering approach to clustering schemas into their domain hierarchy.
Issue Date:2006
Description:193 p.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2006.
Other Identifier(s):(MiAaPQ)AAI3242865
Date Available in IDEALS:2015-09-25
Date Deposited:2006

This item appears in the following Collection(s)

Item Statistics