Graph anomaly detection with few labels: a data-centric approach

Xiaoxiao Ma, Ruikun Li, Fanzhen Liu, Kaize Ding, Jian Yang, Jia Wu

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

7 Citations (Scopus)
506 Downloads (Pure)

Abstract

Anomalous node detection in a static graph faces significant challenges due to the rarity of anomalies and the substantial cost of labeling their deviant structure and attribute patterns. These challenges give rise to data-centric problems, including extremely imbalanced data distributions and intricate graph learning, which significantly impede machine learning and deep learning methods from discerning the patterns of graph anomalies with few labels. While these issues remain crucial, much of the current research focuses on addressing the induced technical challenges, treating the shortage of labeled data as a given. Distinct from previous efforts, this work focuses on tackling the data-centric problems by generating auxiliary training nodes that conform to the original graph topology and attribute distribution. We categorize this approach as data-centric, aiming to enhance existing anomaly detectors by training them on our synthetic data. However, the methods for generating nodes and the effectiveness of utilizing synthetic data for graph anomaly detection remain unexplored in the realm. To answer these questions, we thoroughly investigate the denoising diffusion model. Drawing from our observations on the diffusion process, we illuminate the shifts in graph energy distribution and establish two principles for designing denoising neural networks tailored to graph anomaly generation. From the insights, we propose a diffusion-based graph generation method to synthesize training nodes, which can be promptly integrated to work with existing anomaly detectors. The empirical results on eight widely-used datasets demonstrate our generated data can effectively enhance the nine state-of-the-art graph detectors' performance.
Original languageEnglish
Title of host publicationKDD '24
Subtitle of host publicationproceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages2153–2164
Number of pages12
ISBN (Electronic)9798400704901
DOIs
Publication statusPublished - 2024
EventACM SIGKDD Conference on Knowledge Discovery and Data Mining (30th : 2024) - Barcelona, Spain
Duration: 25 Aug 202429 Aug 2024

Conference

ConferenceACM SIGKDD Conference on Knowledge Discovery and Data Mining (30th : 2024)
Abbreviated titleKDD '24
Country/TerritorySpain
CityBarcelona
Period25/08/2429/08/24

Bibliographical note

Copyright the Author(s) 2024. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Graph Anomaly Detection
  • Generative Graph Diffusion

Fingerprint

Dive into the research topics of 'Graph anomaly detection with few labels: a data-centric approach'. Together they form a unique fingerprint.

Cite this