History-aware privacy budget allocation for model training on evolving data-sharing platforms

Linchang Xiao, Xianzhi Zhang, Di Wu, Miao Hu, Yipeng Zhou, Shui Yu

Research output: Contribution to journalArticlepeer-review

Abstract

The publicly released machine learning (ML) models are susceptible to malicious attacks ( e.g. , gradient leakage attacks), which may expose sensitive training data of data-sharing platforms to untrusted third-parties. To preserve the privacy of training data, differential privacy (DP) is exploited to limit the amount of leaked privacy with a predefined budget, which in fact is a non-recoverable resource. Considering DP, allocating privacy budgets to ML queries is a non-trivial but crucial problem because a certain amount of non-recoverable privacy budget will be consumed if a datablock is assigned to a query once. Meanwhile, both datablocks and ML queries are continuously generated, which further complicates the problem. Most existing works simply relied on greedy-based algorithms to make myopic allocation decisions, far away from the optimal decision. In this paper, we propose a novel H istory-aware P rivacy B udget A llocation (HPBA) algorithm for data-sharing platforms to address the above challenges. Different from existing works, HPBA leverages historical query records to approximate global ML query patterns so as to overcome the drawback of shortsighted greedy-based algorithms. Moreover, the performance of HPBA is theoretically guaranteed by competitive analysis. A lightweight version called S-HPBA is proposed to further reduce computation overhead by using fewer historical records. Experimental results demonstrate that, compared to the state-of-the-art baselines, HPBA and S-HPBA improve the average performance by 32.8% and 16.2% in terms of model accuracy, respectively.
Original languageEnglish
Number of pages16
JournalIEEE Transactions on Services Computing
DOIs
Publication statusE-pub ahead of print - 28 Aug 2024

Keywords

  • budget allocation
  • Computational modeling
  • Data models
  • data-sharing platforms
  • differential privacy
  • Differential privacy
  • online algorithm
  • Privacy
  • Resource management
  • Training
  • Training data

Cite this