TY - JOUR
T1 - Communication compression techniques in distributed deep learning
T2 - a survey
AU - Wang, Zeqin
AU - Wen, Ming
AU - Xu, Yuedong
AU - Zhou, Yipeng
AU - Wang, Jessie Hui
AU - Zhang, Liang
PY - 2023/9
Y1 - 2023/9
N2 - Nowadays, the training data and neural network models are getting increasingly large. The training time of deep learning will become unbearably long on a single machine. To reduce the computation and storage burdens, distributed deep learning has been put forward to collaboratively train a large neural network model with multiple computing nodes in parallel. The unbalanced development of computation and communication capabilities has led to training time being dominated by communication time, making the communication overhead a major challenge toward efficient distributed deep learning. Communication compression is an effective method to alleviate communication overhead, and it has evolved from simple random sparsification or quantization to versatile strategies or data structures. In this survey, existing communication compression techniques are reviewed and classified to provide a bird's eye view. The main properties of each class of compression methods are analyzed, and their applications or theoretical convergence are described if necessary. This survey is potentially helpful for researchers and engineers to understand the up-to-date achievements on the communication compression techniques that accelerate the training of large deep learning models.
AB - Nowadays, the training data and neural network models are getting increasingly large. The training time of deep learning will become unbearably long on a single machine. To reduce the computation and storage burdens, distributed deep learning has been put forward to collaboratively train a large neural network model with multiple computing nodes in parallel. The unbalanced development of computation and communication capabilities has led to training time being dominated by communication time, making the communication overhead a major challenge toward efficient distributed deep learning. Communication compression is an effective method to alleviate communication overhead, and it has evolved from simple random sparsification or quantization to versatile strategies or data structures. In this survey, existing communication compression techniques are reviewed and classified to provide a bird's eye view. The main properties of each class of compression methods are analyzed, and their applications or theoretical convergence are described if necessary. This survey is potentially helpful for researchers and engineers to understand the up-to-date achievements on the communication compression techniques that accelerate the training of large deep learning models.
KW - Distributed deep learning
KW - Communication compression
KW - Sparsification
KW - Quantization
UR - http://www.scopus.com/inward/record.url?scp=85164277579&partnerID=8YFLogxK
U2 - 10.1016/j.sysarc.2023.102927
DO - 10.1016/j.sysarc.2023.102927
M3 - Review article
AN - SCOPUS:85164277579
SN - 1383-7621
VL - 142
SP - 1
EP - 26
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
M1 - 102927
ER -