Title:Entity Retrieval over Structured Data
Author(s):Fang, Hui; Sinha, Rishi R.; Wu, Wensheng; Doan, AnHai; Zhai, ChengXiang
Subject(s):entity retrieval
data structures
Abstract:Entity retrieval is the problem of finding information about a given real-world entity (e.g., director Peter Jackson) from one or a set of data sources. This problem is fundamental in numerous data management settings, but has received little attention. We define the general entity retrieval problem, then discuss the limitations of current information systems (e.g. relational databases, search engines) in solving it. Next, we focus on the specific problem of entity retrieval over structured data (as opposed to text or Web pages). We show that it is inherently more general and difficult than the actively-studied problem of entity matching (i.e. record linkage). We then develop the ENRICH system, which significantly extends entity matching solutions to perform entity retrieval. In particular, ENRICH employs clustering techniques to obtain a global picture on how many entities are "out there" and which data fragment should best be assigned to which entity. It also constructs profiles that capture important characteristics of the target entity, then uses the profiles to help the assignment process. Finally, it leverages "query expansion", an idea commonly used in the information retrieval community, to further improve retrieval accuracy. We apply ENRICH to several real-world domain, and show that it can perform entity retrieval with high accuracy.
Issue Date:2005-12
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2005-2675
