Abstract
Data veracity is a grand challenge for various tasks on the Web. Since the web data sources are inherently unreliable and may provide conflicting information about the same real-world entities, truth discovery is emerging as a counter-measure of resolving the conflicts by discovering the truth, which conforms to the reality, from the multi-source data. A major challenge related to truth discovery is that different data items may have varying numbers of true values (or multi-truth), which counters the assumption of existing truth discovery methods that each data item should have exactly one true value. In this paper, we address this challenge by exploiting and leveraging the implications from multi-source data. In particular, we exploit three types of implications, namely the implicit negative claims, the distribution of positive/negative claims, and the co-occurrence of values in sources' claims, to facilitate multi-truth discovery. We propose a probabilistic approach with improvement measures that incorporate the three implications in all stages of truth discovery process. In particular, incorporating the negative claims enables multi-truth discovery, considering the distribution of positive/negative claims relieves truth discovery from the impact of sources' behavioral features in the specific datasets, and considering values' co-occurrence relationship compensates the information lost from evaluating each value in the same claims individually. Experimental results on three real-world datasets demonstrate the effectiveness of our approach.
Original language | English |
---|---|
Title of host publication | CIKM 2016 |
Subtitle of host publication | Proceedings of the 2016 ACM Conference on Information and Knowledge Management |
Place of Publication | New York, NY |
Publisher | Association for Computing Machinery |
Pages | 861-870 |
Number of pages | 10 |
ISBN (Electronic) | 9781450340731 |
DOIs | |
Publication status | Published - 2016 |
Externally published | Yes |
Event | 25th ACM International Conference on Information and Knowledge Management, CIKM 2016 - Indianapolis, United States Duration: 24 Oct 2016 → 28 Oct 2016 |
Other
Other | 25th ACM International Conference on Information and Knowledge Management, CIKM 2016 |
---|---|
Country/Territory | United States |
City | Indianapolis |
Period | 24/10/16 → 28/10/16 |
Keywords
- truth discovery
- multiple true values
- probabilistic model
- imbalanced claims
- Imbalanced claims
- Probabilistic model
- Truth discovery
- Multiple true values