Selective value coupling learning for detecting outliers in high-dimensional categorical data

Guansong Pang, Hongzuo Xu, Longbing Cao, Wentao Zhao

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

27 Citations (Scopus)

Abstract

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate.

Original languageEnglish
Title of host publicationCIKM '17
Subtitle of host publicationproceedings of the 2017 ACM on Conference on Information and Knowledge Management
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages807-816
Number of pages10
ISBN (Electronic)9781450349185
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: 6 Nov 201710 Nov 2017

Conference

Conference26th ACM International Conference on Information and Knowledge Management, CIKM 2017
Country/TerritorySingapore
CitySingapore
Period6/11/1710/11/17

Keywords

  • Outlier Detection
  • High-Dimensional Data
  • Categorical Data
  • Feature Selection
  • Coupling Learning

Fingerprint

Dive into the research topics of 'Selective value coupling learning for detecting outliers in high-dimensional categorical data'. Together they form a unique fingerprint.

Cite this