Files in this item



application/pdfScalable Mining ... ple Database Relations.pdf (1MB)
(no description provided)PDF


Title:Scalable Mining and Link Analysis Across Multiple Database Relations
Author(s):Yin, Xiaoxin
data mining
Abstract:Relational databases are the most popular repository for structured data, and are thus one of the richest sources of knowledge in the world. In a relational database, multiple relations are linked together via entity-relationship links. Unfortunately, most existing data mining approaches can only handle data stored in single tables, and cannot be applied to relational databases. Therefore, it is an urgent task to design data mining approaches that can discover knowledge from multi-relational data. In this thesis we study three most important data mining tasks in multi-relational environments: classification, clustering, and duplicate detection. Since information is widely spread across multiple relations, the most crucial and common challenge in multi-relational data mining is how to utilize the relational information linked with each object. We rely on two types of information, --- neighbor tuples and linkages between objects, to analyze the properties of objects and relationships among them. Because of the complexity of multi-relational data, efficiency and scalability are two major concerns in multi-relational data mining. In this thesis we propose scalable and accurate approaches for each data mining task studied. In order to achieve high efficiency and scalability, the approaches utilize novel techniques for virtually joining different relations, single-scan algorithms, and multi-resolutional data structures to dramatically reduce computational costs. Our experiments show that our approaches are highly efficient and scalable, and also achieve high accuracies in multi-relational data mining.
Issue Date:2007-03
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2007-2805
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-21

This item appears in the following Collection(s)

Item Statistics