Abstract
This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate.
| Original language | English |
|---|---|
| Title of host publication | CIKM '17 |
| Subtitle of host publication | proceedings of the 2017 ACM on Conference on Information and Knowledge Management |
| Place of Publication | New York |
| Publisher | Association for Computing Machinery (ACM) |
| Pages | 807-816 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781450349185 |
| DOIs | |
| Publication status | Published - 2017 |
| Externally published | Yes |
| Event | 26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore Duration: 6 Nov 2017 → 10 Nov 2017 |
Conference
| Conference | 26th ACM International Conference on Information and Knowledge Management, CIKM 2017 |
|---|---|
| Country/Territory | Singapore |
| City | Singapore |
| Period | 6/11/17 → 10/11/17 |
Keywords
- Outlier Detection
- High-Dimensional Data
- Categorical Data
- Feature Selection
- Coupling Learning