Abstract
Truth discovery is the fundamental technique for resolving the conflicts between the information provided by different data sources by detecting the true values. Traditional methods assume that each data item has only one true value and therefore cannot deal with the circumstances where one data item has multiple true values (i.e., multi-value truth). In this work, we target at this new challenge and propose a generalized Bayesian framework that comprehensively incorporates the features of multi-value truth for the accurate and efficient multi-source data integration. In particular, we identify three key features of multi-value truth, called source-value mapping, differentiated mutual exclusion, and complicated source dependency, to better solve the problem. In particular, sources and values are aggregated based on their mapping to reduce the problem scale, the exclusive relations between values are quantified to reflect the effect of multi-truth, and a fine-grained copy detection method is proposed to deal with complicated source dependency. The data preference of model is also incorporated for fast parameter configuration. Experimental results on real-world and large-scale synthetic datasets demonstrate the effectiveness of our approach, with less execution time and an average 5% higher F1 compared to the latest method.
Original language | English |
---|---|
Pages (from-to) | 1557-1583 |
Number of pages | 27 |
Journal | Computing |
Volume | 106 |
Issue number | 5 |
DOIs | |
Publication status | Published - May 2024 |
Keywords
- Truth discovery
- Multi-truth features
- Bayesian model
- Source dependence