Truth discovery via exploiting implications from multi-source data

Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, Xiaofei Xu, Boualem Benatallah

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

23 Citations (Scopus)

Abstract

Data veracity is a grand challenge for various tasks on the Web. Since the web data sources are inherently unreliable and may provide conflicting information about the same real-world entities, truth discovery is emerging as a counter-measure of resolving the conflicts by discovering the truth, which conforms to the reality, from the multi-source data. A major challenge related to truth discovery is that different data items may have varying numbers of true values (or multi-truth), which counters the assumption of existing truth discovery methods that each data item should have exactly one true value. In this paper, we address this challenge by exploiting and leveraging the implications from multi-source data. In particular, we exploit three types of implications, namely the implicit negative claims, the distribution of positive/negative claims, and the co-occurrence of values in sources' claims, to facilitate multi-truth discovery. We propose a probabilistic approach with improvement measures that incorporate the three implications in all stages of truth discovery process. In particular, incorporating the negative claims enables multi-truth discovery, considering the distribution of positive/negative claims relieves truth discovery from the impact of sources' behavioral features in the specific datasets, and considering values' co-occurrence relationship compensates the information lost from evaluating each value in the same claims individually. Experimental results on three real-world datasets demonstrate the effectiveness of our approach.

Original languageEnglish
Title of host publicationCIKM 2016
Subtitle of host publicationProceedings of the 2016 ACM Conference on Information and Knowledge Management
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery
Pages861-870
Number of pages10
ISBN (Electronic)9781450340731
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event25th ACM International Conference on Information and Knowledge Management, CIKM 2016 - Indianapolis, United States
Duration: 24 Oct 201628 Oct 2016

Other

Other25th ACM International Conference on Information and Knowledge Management, CIKM 2016
Country/TerritoryUnited States
CityIndianapolis
Period24/10/1628/10/16

Keywords

  • truth discovery
  • multiple true values
  • probabilistic model
  • imbalanced claims
  • Imbalanced claims
  • Probabilistic model
  • Truth discovery
  • Multiple true values

Fingerprint

Dive into the research topics of 'Truth discovery via exploiting implications from multi-source data'. Together they form a unique fingerprint.

Cite this