TY - GEN
T1 - FLM-TopK
T2 - 2025 IEEE Conference on Computer Communications, INFOCOM 2025
AU - Qiu, Wenqi
AU - Zhou, Yipeng
AU - Wang, Jinzhi
AU - Sheng, Quan Z.
AU - Cui, Laizhong
PY - 2025
Y1 - 2025
N2 - The past few years have witnessed the unprecedented capability of large language models (LLMs). To adapt LLMs with various downstream tasks, fine-tuning methods, e.g., Low-Rank Adaptation (LoRA), are proposed to efficiently tune LLMs. Meanwhile, federated LLM tuning emerges for refining LLMs with clients owning private data. In the federated tuning process, the server and clients frequently exchange fine-tune gradients via Internet, giving the rise of the communication challenge. To overcome this challenge, most existing works employ quantization methods for compressing gradients because sparsification methods like TopK incur heavy overhead for transmitting position IDs (PIDs) of sparsified gradients. In this work, to expedite federated LLM tuning with a higher compression rate, we design the Federated LLM Tuning with TopK (FLM-TopK) algorithm. Specifically, FLM-TopK intervalizes gradients before compression. Then, TopK is separately applied for gradients in each interval so that the overhead representing PIDs is constrained. To optimize our algorithm, we empirically study the distribution of gradients, which obeys the Gaussian distribution. Based on the Gaussian distribution, we establish an optimization problem to minimize the compression error by jointly optimizing the interval size and the sparsification rate per interval. We prove that the non-convex problem can be approximately solved by alternating optimization. To demonstrate the superiority of FLM-TopK, we conduct extensive experiments on nine public datasets. The results demonstrate that FLM-TopK significantly outperforms SOTA baselines, achieving 6.42%-18.87% improvement in accuracy and 17.07%-44.44% reduction in communication traffic.
AB - The past few years have witnessed the unprecedented capability of large language models (LLMs). To adapt LLMs with various downstream tasks, fine-tuning methods, e.g., Low-Rank Adaptation (LoRA), are proposed to efficiently tune LLMs. Meanwhile, federated LLM tuning emerges for refining LLMs with clients owning private data. In the federated tuning process, the server and clients frequently exchange fine-tune gradients via Internet, giving the rise of the communication challenge. To overcome this challenge, most existing works employ quantization methods for compressing gradients because sparsification methods like TopK incur heavy overhead for transmitting position IDs (PIDs) of sparsified gradients. In this work, to expedite federated LLM tuning with a higher compression rate, we design the Federated LLM Tuning with TopK (FLM-TopK) algorithm. Specifically, FLM-TopK intervalizes gradients before compression. Then, TopK is separately applied for gradients in each interval so that the overhead representing PIDs is constrained. To optimize our algorithm, we empirically study the distribution of gradients, which obeys the Gaussian distribution. Based on the Gaussian distribution, we establish an optimization problem to minimize the compression error by jointly optimizing the interval size and the sparsification rate per interval. We prove that the non-convex problem can be approximately solved by alternating optimization. To demonstrate the superiority of FLM-TopK, we conduct extensive experiments on nine public datasets. The results demonstrate that FLM-TopK significantly outperforms SOTA baselines, achieving 6.42%-18.87% improvement in accuracy and 17.07%-44.44% reduction in communication traffic.
UR - http://www.scopus.com/inward/record.url?scp=105011096378&partnerID=8YFLogxK
M3 - Conference proceeding contribution
AN - SCOPUS:105011096378
SN - 9798331543068
BT - IEEE INFOCOM 2025 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
Y2 - 19 May 2025 through 22 May 2025
ER -