Abstract
A novel online ensemble strategy, ensemble BPegasos (EBPegasos), is proposed to solve the problems simultaneously caused by concept drifting and the curse of dimensionality in classifying high-dimensional evolving data streams, which has not been addressed in the literature. First, EBPegasos uses BPegasos, an online kernelized SVM-based algorithm, as the component classifier to address the scalability and sparsity of high-dimensional data. Second, EBPegasos takes full advantage of the characteristics of BPegasos to cope with various types of concept drifts. Specifically, EBPegasos constructs diverse component classifiers by controlling the budget size of BPegasos; it also equips each component with a drift detector to monitor and evaluate its performance, and modifies the ensemble structure only when large performance degradation occurs. Such conditional structural modification strategy makes EBPegasos strike a good balance between exploiting and forgetting old knowledge. Lastly, we first prove experimentally that EBPegasos is more effective and resource-efficient than the tree ensembles on high-dimensional data. Then comprehensive experiments on synthetic and real-life datasets also show that EBPegasos can cope with various types of concept drifts significantly better than the state-of-the-art ensemble frameworks when all ensembles use BPegasos as the base learner.
Original language | English |
---|---|
Pages (from-to) | 1242–1265 |
Number of pages | 24 |
Journal | Data Mining and Knowledge Discovery |
Volume | 31 |
Issue number | 5 |
DOIs | |
Publication status | Published - Sept 2017 |
Externally published | Yes |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2017 - Skopje, Macedonia, The Former Yugoslav Republic of Duration: 18 Sept 2017 → 22 Sept 2017 |
Keywords
- High dimensionality
- Concept drift
- Data stream classification
- Online ensemble