Abstract
The publicly released machine learning (ML) models are susceptible to
malicious attacks (
e.g.
, gradient leakage attacks), which may expose sensitive training data of
data-sharing platforms to untrusted third-parties. To preserve the
privacy of training data, differential privacy (DP) is exploited to
limit the amount of leaked privacy with a predefined budget, which in
fact is a non-recoverable resource. Considering DP, allocating privacy
budgets to ML queries is a non-trivial but crucial problem because a
certain amount of non-recoverable privacy budget will be consumed if a
datablock is assigned to a query once. Meanwhile, both datablocks and ML
queries are continuously generated, which further complicates the
problem. Most existing works simply relied on greedy-based algorithms to
make myopic allocation decisions, far away from the optimal decision.
In this paper, we propose a novel
H
istory-aware
P
rivacy
B
udget
A
llocation (HPBA) algorithm for data-sharing platforms to address the
above challenges. Different from existing works, HPBA leverages
historical query records to approximate global ML query patterns so as
to overcome the drawback of shortsighted greedy-based algorithms.
Moreover, the performance of HPBA is theoretically guaranteed by
competitive analysis. A lightweight version called S-HPBA is proposed to
further reduce computation overhead by using fewer historical records.
Experimental results demonstrate that, compared to the state-of-the-art
baselines, HPBA and S-HPBA improve the average performance by 32.8% and
16.2% in terms of model accuracy, respectively.
Original language | English |
---|---|
Number of pages | 16 |
Journal | IEEE Transactions on Services Computing |
DOIs | |
Publication status | E-pub ahead of print - 28 Aug 2024 |
Keywords
- budget allocation
- Computational modeling
- Data models
- data-sharing platforms
- differential privacy
- Differential privacy
- online algorithm
- Privacy
- Resource management
- Training
- Training data