Files in this item

FilesDescriptionFormat

application/pdf

application/pdfShi_Zhi.pdf (1MB)
(no description provided)PDF

Description

Title:Integrating multiple conflicting sources by truth discovery and source quality estimation
Author(s):Zhi, Shi
Advisor(s):Han, Jiawei
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Truth Discovery
Data Integration
Data Quality
Abstract:Multiple descriptions about the same entity from different sources will inevitably result in data or information inconsistency. Among conflicting pieces of information, which one is the most trustworthy? How to detect the fraudulence of a rumor? Obviously, it is unrealistic to curate and validate the trustworthiness of every piece of information because of the high cost of human labeling and lack of experts. To find the truth of each entity, much research work has shown that considering the quality of information providers can improve the performance of data integration. Due to different quality of data sources, it is hard to find a general solution that works for every case. Therefore, we start from a general setting of truth analysis at first and narrow down to two basic problems in data integration. We first propose a general framework to deal with numerical data with flexibility of defining loss function. Source quality is represented by a vector to model the source credibility in different error interval. Then we propose a new method called No Truth Truth Model(NTTM) to deal with truth existence problem in low-quality data. Preliminary experiments on real stock data and slot filling data show promising results.
Issue Date:2014-09-16
URI:http://hdl.handle.net/2142/50493
Rights Information:Copyright 2014 Shi Zhi
Date Available in IDEALS:2014-09-16
2016-09-22
Date Deposited:2014-08


This item appears in the following Collection(s)

Item Statistics