IDEALS Home University of Illinois at Urbana-Champaign logo The Alma Mater The Main Quad

Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web

Show full item record

Bookmark or cite this item: http://hdl.handle.net/2142/10968

Files in this item

File Description Format
PDF Weaving Entitie ... tion Mining on the Web.pdf (319KB) (no description provided) PDF
Title: Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web
Author(s): Kelley, Joseph M.; Chang, Kevin C-C.; Cheng, Tao; Chuang, Shui-Lung; Davis, William
Subject(s): data mining
Abstract: With its sheer amount of information, the Web is clearly an important frontier for data mining. While Web mining must start with content on the Web, there is no effective ``search-based'' mechanism to help sifting through the information on the Web. Our goal is to provide a such online search-based facility for supporting query primitives, upon which Web mining applications can be built. As a first step, this paper aims at entity-relation discovery, or E-R discovery, as a useful function-- to weave scattered entities on the Web into coherent relations. To begin with, as our proposal, we formalize the concept of E-R discovery. Further, to realize E-R discovery, as our main thesis, we abstract tuple ranking-- the essential challenge of E-R discovery-- as pattern-based cooccurrence analysis. Finally, as our key insight, we observe that such relation mining shares the same core functions as traditional page-retrieval systems, which enables us to build the new E-R discovery upon today's search engines, almost for free. We report our system prototype and testbed, WISDM-ER, with real Web corpus. Our case studies have demonstrated a high promise, achieving 83%-91% accuracy for real benchmark queries-- and thus the real possibilities of enabling ad-hoc Web mining tasks with online E-R discovery.
Issue Date: 2004-11
Genre: Technical Report
Type: Text
URI: http://hdl.handle.net/2142/10968
Rights Information: You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS: 2009-04-17
 

This item appears in the following Collection(s)

Show full item record

Item Statistics

  • Total Downloads: 142
  • Downloads this Month: 0
  • Downloads Today: 0

Browse

My Account

Information

Access Key