TY - GEN
T1 - Bayesian heteroskedastic choice modeling on non-identically distributed linkages
AU - Hu, Liang
AU - Cao, Wei
AU - Cao, Jian
AU - Xu, Guandong
AU - Cao, Longbing
AU - Gu, Zhiping
PY - 2014
Y1 - 2014
N2 - Choice modeling (CM) aims to describe and predict choices according to attributes of subjects and options. If we presume each choice making as the formation of link between subjects and options, immediately CM can be bridged to link analysis and prediction (LAP) problem. However, such a mapping is often not trivial and straightforward. In LAP problems, the only available observations are links among objects but their attributes are often inaccessible. Therefore, we extend CM into a latent feature space to avoid the need of explicit attributes. Moreover, LAP is usually based on binary linkage assumption that models observed links as positive instances and unobserved links as negative instances. Instead, we use a weaker assumption that treats unobserved links as pseudo negative instances. Furthermore, most subjects or options may be quite heterogeneous due to the long-tail distribution, which is failed to capture by conventional LAP approaches. To address above challenges, we propose a Bayesian heteroskedastic choice model to represent the non-identically distributed linkages in the LAP problems. Finally, the empirical evaluation on real-world datasets proves the superiority of our approach.
AB - Choice modeling (CM) aims to describe and predict choices according to attributes of subjects and options. If we presume each choice making as the formation of link between subjects and options, immediately CM can be bridged to link analysis and prediction (LAP) problem. However, such a mapping is often not trivial and straightforward. In LAP problems, the only available observations are links among objects but their attributes are often inaccessible. Therefore, we extend CM into a latent feature space to avoid the need of explicit attributes. Moreover, LAP is usually based on binary linkage assumption that models observed links as positive instances and unobserved links as negative instances. Instead, we use a weaker assumption that treats unobserved links as pseudo negative instances. Furthermore, most subjects or options may be quite heterogeneous due to the long-tail distribution, which is failed to capture by conventional LAP approaches. To address above challenges, we propose a Bayesian heteroskedastic choice model to represent the non-identically distributed linkages in the LAP problems. Finally, the empirical evaluation on real-world datasets proves the superiority of our approach.
KW - link analysis and prediction
KW - heteroskedastic choice model
KW - non-IID Bayesian analysis
KW - parallel Gibbs sampling
UR - http://www.scopus.com/inward/record.url?scp=84936942130&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2014.84
DO - 10.1109/ICDM.2014.84
M3 - Conference proceeding contribution
SP - 851
EP - 856
BT - 14th IEEE International Conference on Data Mining ICDM 2014
A2 - Kumar, Ravi
A2 - Toivonen, Hannu
A2 - Pei, Jian
A2 - Zhexue Huang, Joshua
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
T2 - 14th IEEE International Conference on Data Mining, ICDM 2014
Y2 - 14 December 2014 through 17 December 2014
ER -