Description

 Title: MEDIATE: Learning to Match Entity Mentions across Text and Databases Author(s): Doan, AnHai; Li, Xin; Roth, Dan Subject(s): computer science Abstract: Many real-world applications increasingly involve both structured data and text. A given real-world entity is often referred to in different ways, such as Helen Hunt'', and Mrs. H. E. Hunt'', both within and across the structured data and the text. Due to this {\em semantic heterogeneity}, it remains extremely difficult to glue together information about real-world entities from the available data sources and effectively utilize both types of information. This paper describes the \mediate\ system which automatically matches entity mentions {\em within\/} and {\em across\/} both text and databases. The system can handle multiple types of entities (e.g., people, movies, locations), is easily extensible to new entity types, and operates with no need for annotated training data. Given a relational database and a set of text documents, \mediate\ learns from the data a {\em generative model\/} that provides a probabilistic view on how a data creator might have generated mentions, then applies it to matching the mentions. The model exploits the similarity of mention names, common transformations across mentions, and context information such as age, gender, and entity co-occurrence. To maximize matching accuracy, \mediate\ also propagates information across contexts. Experiments on real-world data show that \mediate\ significantly outperforms existing methods that address aspects of this problem, and that it can exploit text to improve record linkage, and vice versa. Issue Date: 2006-02 Genre: Technical Report Type: Text URI: http://hdl.handle.net/2142/11216 Other Identifier(s): UIUCDCS-R-2006-2692 Date Available in IDEALS: 2009-04-21
