Files in this item



application/pdfEntity Retrieval over Structured Data.pdf (234kB)
(no description provided)PDF


Title:Entity Retrieval over Structured Data
Author(s):Fang, Hui; Sinha, Rishi R.; Wu, Wensheng; Doan, AnHai; Zhai, ChengXiang
Subject(s):entity retrieval
data structures
Abstract:Entity retrieval is the problem of finding information about a given real-world entity (e.g., director Peter Jackson) from one or a set of data sources. This problem is fundamental in numerous data management settings, but has received little attention. We define the general entity retrieval problem, then discuss the limitations of current information systems (e.g. relational databases, search engines) in solving it. Next, we focus on the specific problem of entity retrieval over structured data (as opposed to text or Web pages). We show that it is inherently more general and difficult than the actively-studied problem of entity matching (i.e. record linkage). We then develop the ENRICH system, which significantly extends entity matching solutions to perform entity retrieval. In particular, ENRICH employs clustering techniques to obtain a global picture on how many entities are "out there" and which data fragment should best be assigned to which entity. It also constructs profiles that capture important characteristics of the target entity, then uses the profiles to help the assignment process. Finally, it leverages "query expansion", an idea commonly used in the information retrieval community, to further improve retrieval accuracy. We apply ENRICH to several real-world domain, and show that it can perform entity retrieval with high accuracy.
Issue Date:2005-12
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2005-2675
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-20

This item appears in the following Collection(s)

Item Statistics