Dynamic self-attention gated spatial-temporal graph convolutional network for skeleton-based human activity recognition

Yi Xia*, Sira Yongchareon, Raymond Lutui, Quan Z. Sheng

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Human Activity Recognition (HAR) involves the automatic detection and classification of human actions from sensor data, enabling numerous applications in fields such as healthcare, security, and human-computer interaction. Skeleton-based HAR has gained significant attention due to its robustness against variations in appearance and environmental conditions, leveraging the structural information of human poses to recognize and classify activities accurately. Despite significant advancements, current deep learning models often use fixed topological structures and cannot directly construct long-range features or distinguish subtle motions. To address these challenges, we propose a Dynamic Self-Attention Gated Spatial-Temporal Graph Convolutional Network (DSAT-GCN) that integrates graph convolutional networks (GCNs) with advanced attention mechanisms. The DSAT-GCN consists of a Gated Self-Attention module that enhances the capture of long-range dependencies, a Dynamic Feature Fusion module that adaptively integrates features across multiple scales, and a Spatial Graph Convolution module and a Temporal Convolution module that effectively model spatial relationships and temporal dynamics. The results show that our model significantly outperforms existing methods, achieving an accuracy of 92.6% on NTU RGB + D 60 (X-Sub) and 96.9% (X-View), 89.2% on NTU RGB + D 120 (X-Sub) and 90.4% (X-View), as well as 89% (Top-1) and 94.5% (Top-5) on Kinetics Skeleton. Additionally, ablation studies confirm the critical contributions of each module to the overall performance of our model.

Original languageEnglish
Title of host publicationNeural Information Processing
Subtitle of host publication31st International Conference, ICONIP 2024, Auckland, New Zealand, December 2–6, 2024, proceedings, part V
EditorsMufti Mahmud, Maryam Doborjeh, Kevin Wong, Andrew Chi Sing Leung, Zohreh Doborjeh, M. Tanveer
Place of PublicationSingapore
PublisherSpringer, Springer Nature
Pages197-211
Number of pages15
ISBN (Electronic)9789819665884
ISBN (Print)9789819665877
DOIs
Publication statusPublished - 2025
Event31st International Conference on Neural Information Processing, ICONIP 2024 - Auckland, New Zealand
Duration: 2 Dec 20246 Dec 2024

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume15290
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference31st International Conference on Neural Information Processing, ICONIP 2024
Country/TerritoryNew Zealand
CityAuckland
Period2/12/246/12/24

Keywords

  • Human Activity Recognition
  • Skeleton-based HAR
  • Graph Convolutional Networks
  • Self-Attention Mechanisms
  • Dynamic Feature Fusion

Fingerprint

Dive into the research topics of 'Dynamic self-attention gated spatial-temporal graph convolutional network for skeleton-based human activity recognition'. Together they form a unique fingerprint.

Cite this