LSHiForest: a generic framework for fast tree isolation based ensemble anomaly analysis

Xuyun Zhang, Wanchun Dou, Qiang He, Rui Zhou, Christopher Leckie, Ramamohanarao Kotagiri, Zoran Salcic

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

80 Citations (Scopus)

Abstract

Anomaly or outlier detection is a major challenge in big data analytics because anomaly patterns provide valuable insights for decision-making in a wide range of applications. Recently proposed anomaly detection methods based on the tree isolation mechanism are very fast due to their logarithmic time complexity, making them capable of handling big data sets efficiently. However, the underlying similarity or distance measures in these methods have not been well understood. Contrary to the claims that these methods never rely on any distance measure, we find that they have close relationships with certain distance measures. This implies that the current use of this fast isolation mechanism is only limited to these distance measures and fails to generalise to other commonlyused measures. In this paper, we propose a generic framework named LSHiForest for fast tree isolation based ensemble anomaly analysis with the use of a Locality-Sensitive Hashing (LSH) forest. Being generic, the proposed framework can be instantiated with a diverse range of LSH families, and the fast isolation mechanism can be extended to any distance measures, data types and data spaces where an LSH family is defined. In particular, the instances of our framework with kernelised LSH families or learning based hashing schemes can detect complicated anomalies like local or surrounded anomalies. We also formally show that the existing tree isolation based detection methods are special cases of our framework with the corresponding distance measures. Extensive experiments on both synthetic and real-world benchmark data sets show that the framework can achieve both high time efficiency and anomaly detection quality.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 33rd International Conference on Data Engineering
Subtitle of host publicationICDE 2017
Place of PublicationLos Alamitos, California
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages983-994
Number of pages12
ISBN (Electronic)9781509065431
ISBN (Print)9781509065448
DOIs
Publication statusPublished - 16 May 2017
Externally publishedYes
Event33rd IEEE International Conference on Data Engineering, ICDE 2017 - San Diego, United States
Duration: 19 Apr 201722 Apr 2017

Conference

Conference33rd IEEE International Conference on Data Engineering, ICDE 2017
Country/TerritoryUnited States
CitySan Diego
Period19/04/1722/04/17

Fingerprint

Dive into the research topics of 'LSHiForest: a generic framework for fast tree isolation based ensemble anomaly analysis'. Together they form a unique fingerprint.

Cite this