TY - JOUR
T1 - SmartVote
T2 - a full-fledged graph-based model for multi-valued truth discovery
AU - Fang, Xiu Susie
AU - Sheng, Quan Z.
AU - Wang, Xianzhi
AU - Chu, Dianhui
AU - Ngu, Anne H. H.
PY - 2019/7
Y1 - 2019/7
N2 - In the era of Big Data, truth discovery has emerged as a fundamental research topic, which estimates data veracity by determining the reliability of multiple, often conflicting data sources. Although considerable research efforts have been conducted on this topic, most current approaches assume only one true value for each object. In reality, objects with multiple true values widely exist and the existing approaches that cope with multi-valued objects still lack accuracy. In this paper, we propose a full-fledged graph-based model, SmartVote, which models two types of source relations with additional quantification to precisely estimate source reliability for effective multi-valued truth discovery. Two graphs are constructed and further used to derive different aspects of source reliability (i.e., positive precision and negative precision) via random walk computations. Our model incorporates four important implications, including two types of source relations, object popularity, loose mutual exclusion, and long-tail phenomenon on source coverage, to pursue better accuracy in truth discovery. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach.
AB - In the era of Big Data, truth discovery has emerged as a fundamental research topic, which estimates data veracity by determining the reliability of multiple, often conflicting data sources. Although considerable research efforts have been conducted on this topic, most current approaches assume only one true value for each object. In reality, objects with multiple true values widely exist and the existing approaches that cope with multi-valued objects still lack accuracy. In this paper, we propose a full-fledged graph-based model, SmartVote, which models two types of source relations with additional quantification to precisely estimate source reliability for effective multi-valued truth discovery. Two graphs are constructed and further used to derive different aspects of source reliability (i.e., positive precision and negative precision) via random walk computations. Our model incorporates four important implications, including two types of source relations, object popularity, loose mutual exclusion, and long-tail phenomenon on source coverage, to pursue better accuracy in truth discovery. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach.
KW - Graph-based model
KW - Long-tail phenomenon
KW - Multi-valued objects
KW - Object popularity
KW - Source relations
KW - Truth discovery
UR - http://www.scopus.com/inward/record.url?scp=85052651655&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/arc/FT140101247
UR - http://purl.org/au-research/grants/arc/DP180102378
U2 - 10.1007/s11280-018-0629-3
DO - 10.1007/s11280-018-0629-3
M3 - Article
AN - SCOPUS:85052651655
SN - 1386-145X
VL - 22
SP - 1855
EP - 1885
JO - World Wide Web
JF - World Wide Web
IS - 4
ER -