Abstract
A key challenge to most conventional clustering algorithms in handling many real world problems is that, data points in different clusters are often correlated with different subsets of features. To address this problem, subspace clustering has attracted increasing attention in recent years. In practical data mining applications, data points may arrive in continuous streams with chunks of samples being collected at different time points. In addition, huge amounts of data often cannot be kept in the main memory due to memory restriction. Accordingly, a range of evolving clustering algorithms has been proposed, however, traditional evolving clustering methods cannot be effectively applied to large-scale high dimensional data and data streams. In this study, we extend the online learning strategy and scalable clustering technique to soft subspace clustering to form evolving soft subspace clustering. We propose two online soft subspace clustering algorithms, OFWSC and OEWSC, and two streaming soft subspace clustering algorithms, SSSC-F and SSSC-E. The proposed evolving soft subspace clustering leverages on the effectiveness of online learning scheme and scalable clustering methods for streaming data by revealing the important local subspace characteristics of high dimensional data. Substantial experimental results on both artificial and real-world datasets demonstrate that our proposed methods are generally effective in evolving clustering and achieve superior performance over existing soft subspace clustering techniques.
Original language | English |
---|---|
Pages (from-to) | 210-228 |
Number of pages | 19 |
Journal | Applied Soft Computing Journal |
Volume | 14 |
Issue number | Part B |
DOIs | |
Publication status | Published - Jan 2014 |
Externally published | Yes |
Keywords
- Scalable clustering
- Data stream clustering
- Online clustering
- Subspace clustering