TY - GEN
T1 - Multi-truth discovery while being aware of unbalanced data distribution
AU - Fang, Xiu Susie
AU - Sheng, Quan Z.
AU - Sun, Guohao
AU - Chang, Shan
AU - Wang, Hongya
AU - Yang, Jian
PY - 2023
Y1 - 2023
N2 - Due to information explosion, conflicting data on the same object among
multiple sources is ubiquitous on the Web. To solve those conflicts
while estimating source reliability, truth discovery has become a hot
topic. However, when considering multi-value objects, the inevitable
unbalanced data distribution is overlooked by the existing approaches.
In particular, only a few sources make lots of claims while most sources
only provide a few claims, which renders the source reliability
estimated for “small” sources totally random; Some objects are covered
by plenty of sources while some objects are claimed by only a few
sources, which causes the value correctness calculated for “cold”
objects unreasonable. To tackle the unbalanced data where multi-value
objects exist, we propose a confidence interval based approach (CIMTD).
We estimate source reliability from two aspects, i.e., the ability to
claim the correct number of value(s) and specific value(s) on an object.
To reflect the real reliability for both “big” and “small” sources,
confidence intervals of enriched estimation are considered. While
estimating source reliability, uncertainty degrees are introduced to
model object differences. Confidence intervals are also considered to
reflect the real uncertainty for both “hot” and “cold” objects.
Experimental results on two real-world datasets demonstrate the
effectiveness of our approach.
AB - Due to information explosion, conflicting data on the same object among
multiple sources is ubiquitous on the Web. To solve those conflicts
while estimating source reliability, truth discovery has become a hot
topic. However, when considering multi-value objects, the inevitable
unbalanced data distribution is overlooked by the existing approaches.
In particular, only a few sources make lots of claims while most sources
only provide a few claims, which renders the source reliability
estimated for “small” sources totally random; Some objects are covered
by plenty of sources while some objects are claimed by only a few
sources, which causes the value correctness calculated for “cold”
objects unreasonable. To tackle the unbalanced data where multi-value
objects exist, we propose a confidence interval based approach (CIMTD).
We estimate source reliability from two aspects, i.e., the ability to
claim the correct number of value(s) and specific value(s) on an object.
To reflect the real reliability for both “big” and “small” sources,
confidence intervals of enriched estimation are considered. While
estimating source reliability, uncertainty degrees are introduced to
model object differences. Confidence intervals are also considered to
reflect the real uncertainty for both “hot” and “cold” objects.
Experimental results on two real-world datasets demonstrate the
effectiveness of our approach.
UR - https://www.scopus.com/pages/publications/85169622261
U2 - 10.1109/IJCNN54540.2023.10191906
DO - 10.1109/IJCNN54540.2023.10191906
M3 - Conference proceeding contribution
AN - SCOPUS:85169622261
SN - 9781665488686
BT - 2023 International Joint Conference on Neural Networks (IJCNN)
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
T2 - 2023 International Joint Conference on Neural Networks, IJCNN 2023
Y2 - 18 June 2023 through 23 June 2023
ER -