TY - JOUR
T1 - History-aware privacy budget allocation for model training on evolving data-sharing platforms
AU - Xiao, Linchang
AU - Zhang, Xianzhi
AU - Wu, Di
AU - Hu, Miao
AU - Zhou, Yipeng
AU - Yu, Shui
PY - 2024
Y1 - 2024
N2 - The publicly released machine learning (ML) models are susceptible to
malicious attacks (
e.g.
, gradient leakage attacks), which may expose sensitive training data of
data-sharing platforms to untrusted third-parties. To preserve the
privacy of training data, differential privacy (DP) is exploited to
limit the amount of leaked privacy with a predefined budget, which in
fact is a non-recoverable resource. Considering DP, allocating privacy
budgets to ML queries is a non-trivial but crucial problem because a
certain amount of non-recoverable privacy budget will be consumed if a
datablock is assigned to a query once. Meanwhile, both datablocks and ML
queries are continuously generated, which further complicates the
problem. Most existing works simply relied on greedy-based algorithms to
make myopic allocation decisions, far away from the optimal decision.
In this paper, we propose a novel
H
istory-aware
P
rivacy
B
udget
A
llocation (HPBA) algorithm for data-sharing platforms to address the
above challenges. Different from existing works, HPBA leverages
historical query records to approximate global ML query patterns so as
to overcome the drawback of shortsighted greedy-based algorithms.
Moreover, the performance of HPBA is theoretically guaranteed by
competitive analysis. A lightweight version called S-HPBA is proposed to
further reduce computation overhead by using fewer historical records.
Experimental results demonstrate that, compared to the state-of-the-art
baselines, HPBA and S-HPBA improve the average performance by 32.8% and
16.2% in terms of model accuracy, respectively.
AB - The publicly released machine learning (ML) models are susceptible to
malicious attacks (
e.g.
, gradient leakage attacks), which may expose sensitive training data of
data-sharing platforms to untrusted third-parties. To preserve the
privacy of training data, differential privacy (DP) is exploited to
limit the amount of leaked privacy with a predefined budget, which in
fact is a non-recoverable resource. Considering DP, allocating privacy
budgets to ML queries is a non-trivial but crucial problem because a
certain amount of non-recoverable privacy budget will be consumed if a
datablock is assigned to a query once. Meanwhile, both datablocks and ML
queries are continuously generated, which further complicates the
problem. Most existing works simply relied on greedy-based algorithms to
make myopic allocation decisions, far away from the optimal decision.
In this paper, we propose a novel
H
istory-aware
P
rivacy
B
udget
A
llocation (HPBA) algorithm for data-sharing platforms to address the
above challenges. Different from existing works, HPBA leverages
historical query records to approximate global ML query patterns so as
to overcome the drawback of shortsighted greedy-based algorithms.
Moreover, the performance of HPBA is theoretically guaranteed by
competitive analysis. A lightweight version called S-HPBA is proposed to
further reduce computation overhead by using fewer historical records.
Experimental results demonstrate that, compared to the state-of-the-art
baselines, HPBA and S-HPBA improve the average performance by 32.8% and
16.2% in terms of model accuracy, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85202786397&partnerID=8YFLogxK
U2 - 10.1109/TSC.2024.3451187
DO - 10.1109/TSC.2024.3451187
M3 - Article
AN - SCOPUS:85202786397
SN - 1939-1374
VL - 17
SP - 3773
EP - 3788
JO - IEEE Transactions on Services Computing
JF - IEEE Transactions on Services Computing
IS - 6
ER -