TY - JOUR
T1 - Exploring sparsity in graph transformers
AU - Liu, Chuang
AU - Zhan, Yibing
AU - Ma, Xueqi
AU - Ding, Liang
AU - Tao, Dapeng
AU - Wu, Jia
AU - Hu, Wenbin
AU - Du, Bo
PY - 2024/6
Y1 - 2024/6
N2 - Graph Transformers (GTs) have achieved impressive results on various graph-related tasks. However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments. Therefore, in this paper, we explore the feasibility of sparsifying GTs, a significant yet under-explored topic. We first discuss the redundancy of GTs based on the characteristics of existing GT models, and then propose a comprehensive Graph Transformer SParsification (GTSP) framework that helps to reduce the computational complexity of GTs from four dimensions: the input graph data, attention heads, model layers, and model weights. Specifically, GTSP designs differentiable masks for each individual compressible component, enabling effective end-to-end pruning. We examine our GTSP through extensive experiments on prominent GTs, including GraphTrans, Graphormer, and GraphGPS. The experimental results demonstrate that GTSP effectively reduces computational costs, with only marginal decreases in accuracy or, in some instances, even improvements. For example, GTSP results in a 30% reduction in Floating Point Operations while contributing to a 1.8% increase in Area Under the Curve accuracy on the OGBG-HIV dataset. Furthermore, we provide several insights on the characteristics of attention heads and the behavior of attention mechanisms, all of which have immense potential to inspire future research endeavors in this domain. Our code is available at https://github.com/LiuChuang0059/GTSP.
AB - Graph Transformers (GTs) have achieved impressive results on various graph-related tasks. However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments. Therefore, in this paper, we explore the feasibility of sparsifying GTs, a significant yet under-explored topic. We first discuss the redundancy of GTs based on the characteristics of existing GT models, and then propose a comprehensive Graph Transformer SParsification (GTSP) framework that helps to reduce the computational complexity of GTs from four dimensions: the input graph data, attention heads, model layers, and model weights. Specifically, GTSP designs differentiable masks for each individual compressible component, enabling effective end-to-end pruning. We examine our GTSP through extensive experiments on prominent GTs, including GraphTrans, Graphormer, and GraphGPS. The experimental results demonstrate that GTSP effectively reduces computational costs, with only marginal decreases in accuracy or, in some instances, even improvements. For example, GTSP results in a 30% reduction in Floating Point Operations while contributing to a 1.8% increase in Area Under the Curve accuracy on the OGBG-HIV dataset. Furthermore, we provide several insights on the characteristics of attention heads and the behavior of attention mechanisms, all of which have immense potential to inspire future research endeavors in this domain. Our code is available at https://github.com/LiuChuang0059/GTSP.
KW - Graph transformers
KW - Graph sparse training
KW - Model pruning
KW - Graph classification
UR - http://www.scopus.com/inward/record.url?scp=85189014418&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2024.106265
DO - 10.1016/j.neunet.2024.106265
M3 - Article
C2 - 38552351
AN - SCOPUS:85189014418
SN - 0893-6080
VL - 174
SP - 1
EP - 9
JO - Neural Networks
JF - Neural Networks
M1 - 106265
ER -