Classification of high-dimensional evolving data streams via a resource-efficient online ensemble

Tingting Zhai, Yang Gao*, Hao Wang, Longbing Cao

*Corresponding author for this work

Research output: Contribution to journalConference paperpeer-review

19 Citations (Scopus)

Abstract

A novel online ensemble strategy, ensemble BPegasos (EBPegasos), is proposed to solve the problems simultaneously caused by concept drifting and the curse of dimensionality in classifying high-dimensional evolving data streams, which has not been addressed in the literature. First, EBPegasos uses BPegasos, an online kernelized SVM-based algorithm, as the component classifier to address the scalability and sparsity of high-dimensional data. Second, EBPegasos takes full advantage of the characteristics of BPegasos to cope with various types of concept drifts. Specifically, EBPegasos constructs diverse component classifiers by controlling the budget size of BPegasos; it also equips each component with a drift detector to monitor and evaluate its performance, and modifies the ensemble structure only when large performance degradation occurs. Such conditional structural modification strategy makes EBPegasos strike a good balance between exploiting and forgetting old knowledge. Lastly, we first prove experimentally that EBPegasos is more effective and resource-efficient than the tree ensembles on high-dimensional data. Then comprehensive experiments on synthetic and real-life datasets also show that EBPegasos can cope with various types of concept drifts significantly better than the state-of-the-art ensemble frameworks when all ensembles use BPegasos as the base learner.

Original languageEnglish
Pages (from-to)1242–1265
Number of pages24
JournalData Mining and Knowledge Discovery
Volume31
Issue number5
DOIs
Publication statusPublished - Sept 2017
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2017 - Skopje, Macedonia, The Former Yugoslav Republic of
Duration: 18 Sept 201722 Sept 2017

Keywords

  • High dimensionality
  • Concept drift
  • Data stream classification
  • Online ensemble

Fingerprint

Dive into the research topics of 'Classification of high-dimensional evolving data streams via a resource-efficient online ensemble'. Together they form a unique fingerprint.

Cite this