TY - GEN
T1 - Dynamic self-attention gated spatial-temporal graph convolutional network for skeleton-based human activity recognition
AU - Xia, Yi
AU - Yongchareon, Sira
AU - Lutui, Raymond
AU - Sheng, Quan Z.
PY - 2025
Y1 - 2025
N2 - Human Activity Recognition (HAR) involves the automatic detection and classification of human actions from sensor data, enabling numerous applications in fields such as healthcare, security, and human-computer interaction. Skeleton-based HAR has gained significant attention due to its robustness against variations in appearance and environmental conditions, leveraging the structural information of human poses to recognize and classify activities accurately. Despite significant advancements, current deep learning models often use fixed topological structures and cannot directly construct long-range features or distinguish subtle motions. To address these challenges, we propose a Dynamic Self-Attention Gated Spatial-Temporal Graph Convolutional Network (DSAT-GCN) that integrates graph convolutional networks (GCNs) with advanced attention mechanisms. The DSAT-GCN consists of a Gated Self-Attention module that enhances the capture of long-range dependencies, a Dynamic Feature Fusion module that adaptively integrates features across multiple scales, and a Spatial Graph Convolution module and a Temporal Convolution module that effectively model spatial relationships and temporal dynamics. The results show that our model significantly outperforms existing methods, achieving an accuracy of 92.6% on NTU RGB + D 60 (X-Sub) and 96.9% (X-View), 89.2% on NTU RGB + D 120 (X-Sub) and 90.4% (X-View), as well as 89% (Top-1) and 94.5% (Top-5) on Kinetics Skeleton. Additionally, ablation studies confirm the critical contributions of each module to the overall performance of our model.
AB - Human Activity Recognition (HAR) involves the automatic detection and classification of human actions from sensor data, enabling numerous applications in fields such as healthcare, security, and human-computer interaction. Skeleton-based HAR has gained significant attention due to its robustness against variations in appearance and environmental conditions, leveraging the structural information of human poses to recognize and classify activities accurately. Despite significant advancements, current deep learning models often use fixed topological structures and cannot directly construct long-range features or distinguish subtle motions. To address these challenges, we propose a Dynamic Self-Attention Gated Spatial-Temporal Graph Convolutional Network (DSAT-GCN) that integrates graph convolutional networks (GCNs) with advanced attention mechanisms. The DSAT-GCN consists of a Gated Self-Attention module that enhances the capture of long-range dependencies, a Dynamic Feature Fusion module that adaptively integrates features across multiple scales, and a Spatial Graph Convolution module and a Temporal Convolution module that effectively model spatial relationships and temporal dynamics. The results show that our model significantly outperforms existing methods, achieving an accuracy of 92.6% on NTU RGB + D 60 (X-Sub) and 96.9% (X-View), 89.2% on NTU RGB + D 120 (X-Sub) and 90.4% (X-View), as well as 89% (Top-1) and 94.5% (Top-5) on Kinetics Skeleton. Additionally, ablation studies confirm the critical contributions of each module to the overall performance of our model.
KW - Human Activity Recognition
KW - Skeleton-based HAR
KW - Graph Convolutional Networks
KW - Self-Attention Mechanisms
KW - Dynamic Feature Fusion
UR - http://www.scopus.com/inward/record.url?scp=105009966124&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-6588-4_14
DO - 10.1007/978-981-96-6588-4_14
M3 - Conference proceeding contribution
AN - SCOPUS:105009966124
SN - 9789819665877
T3 - Lecture Notes in Computer Science
SP - 197
EP - 211
BT - Neural Information Processing
A2 - Mahmud, Mufti
A2 - Doborjeh, Maryam
A2 - Wong, Kevin
A2 - Leung, Andrew Chi Sing
A2 - Doborjeh, Zohreh
A2 - Tanveer, M.
PB - Springer, Springer Nature
CY - Singapore
T2 - 31st International Conference on Neural Information Processing, ICONIP 2024
Y2 - 2 December 2024 through 6 December 2024
ER -