Files in this item



application/pdfExploring the D ... ver Schematic Metadata.pdf (296kB)
(no description provided)PDF


Title:Exploring the Deep Web: Associativity Search over Schematic Metadata
Author(s):Kabra, Govind; Zhang, Zhen; Chang, Kevin Chen-Chuan; Lim, Lipyeow; Wang, Min; Chang, Yuan-Chi
Subject(s):computer science
Abstract:The Web has been rapidly deepened with the prevalence of databases online. As sources proliferate, while there are often useful, alternative, and related sources for our needs, we are lacking an effective facility to explore this "deep Web." For ``ad-hoc users" and ``system integrators" alike, to enable access and integration to the multitude of sources, we often must answer semantic association questions-- How sources relate to each other? What "vocabularies" do they speak? Such semantic associativity is often revealed holistically through cooccurrence analysis of ``schematic metadata,'' which describes the nature of data at sources. We observe two interesting phenomena through the syntactic associativity of sources and their schematic metadata: The first phenomenon, occurrence localities, suggests syntactic associativity as a useful notion for discovering semantic associativity, and the second, fuzzy boundaries, suggests a query-driven rank-based mechanism as its realization. We thus propose to build an associativity search facility for systematic exploration of deep Web sources. In its realization, we combine occurrence analysis and link analysis by abstracting occurrence of metadata in sources as links in a graph, which effectively transforms associativity of entities into connectivity of nodes. To quantify the associativity, we propose a wave propagation model; to compute the associativity efficiently, we develop spatial and temporal optimization strategies. We validate the usefulness and efficiency with a real-world dataset of 30,000 sources. The experiments show that syntactic associativity is not only useful for semantic discovery, but also practical as an online search mechanism.
Issue Date:2006-03
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2006-2701
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-20

This item appears in the following Collection(s)

Item Statistics