TY - JOUR
T1 - Optimizing parameter mixing under constrained communications in parallel federated learning
AU - Liu, Xuezheng
AU - Yan, Zirui
AU - Zhou, Yipeng
AU - Wu, Di
AU - Chen, Xu
AU - Wang, Jessie Hui
PY - 2023/12
Y1 - 2023/12
N2 - In vanilla Federated Learning (FL) systems, a centralized parameter
server (PS) is responsible for collecting, aggregating and distributing
model parameters with decentralized clients. However, the communication
link of a single PS can be easily overloaded by concurrent
communications with a massive number of clients. To overcome this
drawback, multiple PSes can be deployed to form a parallel FL (PFL)
system, in which each PS only communicates with a subset of clients and
its neighbor PSes. On one hand, each PS conducts iterations with clients
in its subset. On the other hand, PSes communicate with each other
periodically to mix their parameters so that they can finally reach a
consensus. In this paper, we propose a novel parallel federated learning
algorithm called Fed-PMA, which optimizes such parallel FL under
constrained communications by conducting parallel parameter mixing and
averaging with theoretic guarantees. We formally analyze the convergence
rate of Fed-PMA with convex loss, and further derive the optimal number
of times each PS should mix with its neighbor PSes so as to maximize
the final model accuracy within a fixed span of training time.
Theoretical study manifests that PSes should mix their parameters more
frequently if the connection between PSes is sparse or the time cost of
mixing is low. Inspired by our analysis, we propose the Fed-APMA
algorithm that can adaptively determine the near-optimal number of
mixing times with non-convex loss under dynamic communication
conditions. Extensive experiments with realistic datasets are carried
out to demonstrate that both Fed-PMA and its adaptive version Fed-APMA
significantly outperform the state-of-the-art baselines.
AB - In vanilla Federated Learning (FL) systems, a centralized parameter
server (PS) is responsible for collecting, aggregating and distributing
model parameters with decentralized clients. However, the communication
link of a single PS can be easily overloaded by concurrent
communications with a massive number of clients. To overcome this
drawback, multiple PSes can be deployed to form a parallel FL (PFL)
system, in which each PS only communicates with a subset of clients and
its neighbor PSes. On one hand, each PS conducts iterations with clients
in its subset. On the other hand, PSes communicate with each other
periodically to mix their parameters so that they can finally reach a
consensus. In this paper, we propose a novel parallel federated learning
algorithm called Fed-PMA, which optimizes such parallel FL under
constrained communications by conducting parallel parameter mixing and
averaging with theoretic guarantees. We formally analyze the convergence
rate of Fed-PMA with convex loss, and further derive the optimal number
of times each PS should mix with its neighbor PSes so as to maximize
the final model accuracy within a fixed span of training time.
Theoretical study manifests that PSes should mix their parameters more
frequently if the connection between PSes is sparse or the time cost of
mixing is low. Inspired by our analysis, we propose the Fed-APMA
algorithm that can adaptively determine the near-optimal number of
mixing times with non-convex loss under dynamic communication
conditions. Extensive experiments with realistic datasets are carried
out to demonstrate that both Fed-PMA and its adaptive version Fed-APMA
significantly outperform the state-of-the-art baselines.
UR - http://www.scopus.com/inward/record.url?scp=85151546287&partnerID=8YFLogxK
U2 - 10.1109/TNET.2023.3257236
DO - 10.1109/TNET.2023.3257236
M3 - Article
AN - SCOPUS:85151546287
SN - 1063-6692
VL - 31
SP - 2640
EP - 2652
JO - IEEE/ACM Transactions on Networking
JF - IEEE/ACM Transactions on Networking
IS - 6
ER -