TY - JOUR
T1 - Online active learning for drifting data streams
AU - Liu, Sanmin
AU - Xue, Shan
AU - Wu, Jia
AU - Zhou, Chuan
AU - Yang, Jian
AU - Li, Zhao
AU - Cao, Jie
PY - 2023/1
Y1 - 2023/1
N2 - Classification methods for streaming data are not new, but very few current frameworks address all three of the most common problems with these tasks: concept drift, noise, and the exorbitant costs associated with labeling the unlabeled instances in data streams. Motivated by this gap in the field, we developed an active learning framework based on a dual-query strategy and Ebbinghaus's law of human memory cognition. Called CogDQS, the query strategy samples only the most representative instances for manual annotation based on local density and uncertainty, thus significantly reducing the cost of labeling. The policy for discerning drift from noise and replacing outdated instances with new concepts is based on the three criteria of the Ebbinghaus forgetting curve: recall, the fading period, and the memory strength. Simulations comparing CogDQS with baselines on six different data streams containing gradual drift or abrupt drift with and without noise show that our approach produces accurate, stable models with good generalization ability at minimal labeling, storage, and computation costs.
AB - Classification methods for streaming data are not new, but very few current frameworks address all three of the most common problems with these tasks: concept drift, noise, and the exorbitant costs associated with labeling the unlabeled instances in data streams. Motivated by this gap in the field, we developed an active learning framework based on a dual-query strategy and Ebbinghaus's law of human memory cognition. Called CogDQS, the query strategy samples only the most representative instances for manual annotation based on local density and uncertainty, thus significantly reducing the cost of labeling. The policy for discerning drift from noise and replacing outdated instances with new concepts is based on the three criteria of the Ebbinghaus forgetting curve: recall, the fading period, and the memory strength. Simulations comparing CogDQS with baselines on six different data streams containing gradual drift or abrupt drift with and without noise show that our approach produces accurate, stable models with good generalization ability at minimal labeling, storage, and computation costs.
UR - http://www.scopus.com/inward/record.url?scp=85111008661&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2021.3091681
DO - 10.1109/TNNLS.2021.3091681
M3 - Article
C2 - 34288874
AN - SCOPUS:85111008661
SN - 2162-237X
VL - 34
SP - 186
EP - 200
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 1
ER -