TY - JOUR
T1 - Positive distribution pollution
T2 - 37th AAAI Conference on Artificial Intelligence, AAAI 2023
AU - Liang, Qianqiao
AU - Zhu, Mengying
AU - Wang, Yan
AU - Wang, Xiuyuan
AU - Zhao, Wanjia
AU - Yang, Mengyuan
AU - Wei, Hua
AU - Han, Bing
AU - Zheng, Xiaolin
PY - 2023
Y1 - 2023
N2 - Positive Unlabeled (PU) learning, which has a wide range of applications, is becoming increasingly prevalent. However, it suffers from problems such as data imbalance, selection bias, and prior agnostic in real scenarios. Existing studies focus on addressing part of these problems, which fail to provide a unified perspective to understand these problems. In this paper, we first rethink these problems by analyzing a typical PU scenario and come up with an insightful point of view that all these problems are inherently connected to one problem, i.e., positive distribution pollution, which refers to the inaccuracy in estimating positive data distribution under very little labeled data. Then, inspired by this insight, we devise a variational model named CoVPU, which addresses all three problems in a unified perspective by targeting the positive distribution pollution problem. CoVPU not only accurately separates the positive data from the unlabeled data based on discrete normalizing flows, but also effectively approximates the positive distribution based on our derived unbiased rebalanced risk estimator and supervises the approximation based on a novel prior-free variational loss. Rigorous theoretical analysis proves the convergence of CoVPU to an optimal Bayesian classifier. Extensive experiments demonstrate the superiority of CoVPU over the state-of-the-art PU learning methods under these problems.
AB - Positive Unlabeled (PU) learning, which has a wide range of applications, is becoming increasingly prevalent. However, it suffers from problems such as data imbalance, selection bias, and prior agnostic in real scenarios. Existing studies focus on addressing part of these problems, which fail to provide a unified perspective to understand these problems. In this paper, we first rethink these problems by analyzing a typical PU scenario and come up with an insightful point of view that all these problems are inherently connected to one problem, i.e., positive distribution pollution, which refers to the inaccuracy in estimating positive data distribution under very little labeled data. Then, inspired by this insight, we devise a variational model named CoVPU, which addresses all three problems in a unified perspective by targeting the positive distribution pollution problem. CoVPU not only accurately separates the positive data from the unlabeled data based on discrete normalizing flows, but also effectively approximates the positive distribution based on our derived unbiased rebalanced risk estimator and supervises the approximation based on a novel prior-free variational loss. Rigorous theoretical analysis proves the convergence of CoVPU to an optimal Bayesian classifier. Extensive experiments demonstrate the superiority of CoVPU over the state-of-the-art PU learning methods under these problems.
UR - http://www.scopus.com/inward/record.url?scp=85168245553&partnerID=8YFLogxK
U2 - 10.1609/aaai.v37i7.26051
DO - 10.1609/aaai.v37i7.26051
M3 - Conference paper
AN - SCOPUS:85168245553
SN - 2374-3468
VL - 37
SP - 8737
EP - 8745
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 7
Y2 - 7 February 2023 through 14 February 2023
ER -