Abstract
In vanilla Federated Learning (FL) systems, a centralized parameter
server (PS) is responsible for collecting, aggregating and distributing
model parameters with decentralized clients. However, the communication
link of a single PS can be easily overloaded by concurrent
communications with a massive number of clients. To overcome this
drawback, multiple PSes can be deployed to form a parallel FL (PFL)
system, in which each PS only communicates with a subset of clients and
its neighbor PSes. On one hand, each PS conducts iterations with clients
in its subset. On the other hand, PSes communicate with each other
periodically to mix their parameters so that they can finally reach a
consensus. In this paper, we propose a novel parallel federated learning
algorithm called Fed-PMA, which optimizes such parallel FL under
constrained communications by conducting parallel parameter mixing and
averaging with theoretic guarantees. We formally analyze the convergence
rate of Fed-PMA with convex loss, and further derive the optimal number
of times each PS should mix with its neighbor PSes so as to maximize
the final model accuracy within a fixed span of training time.
Theoretical study manifests that PSes should mix their parameters more
frequently if the connection between PSes is sparse or the time cost of
mixing is low. Inspired by our analysis, we propose the Fed-APMA
algorithm that can adaptively determine the near-optimal number of
mixing times with non-convex loss under dynamic communication
conditions. Extensive experiments with realistic datasets are carried
out to demonstrate that both Fed-PMA and its adaptive version Fed-APMA
significantly outperform the state-of-the-art baselines.
| Original language | English |
|---|---|
| Pages (from-to) | 2640-2652 |
| Number of pages | 13 |
| Journal | IEEE/ACM Transactions on Networking |
| Volume | 31 |
| Issue number | 6 |
| Early online date | 27 Mar 2023 |
| DOIs | |
| Publication status | Published - Dec 2023 |
Fingerprint
Dive into the research topics of 'Optimizing parameter mixing under constrained communications in parallel federated learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver