Seeing the forest through the trees: data leakage from partial transformer gradients

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Recent studies have shown that distributed machine learning is vulnerable to gradient inversion attacks, where private training data can be reconstructed by analyzing the gradients of the models shared in training. Previous attacks established that such reconstructions are possible using gradients from all parameters in the entire models. However, we hypothesize that most of the involved modules, or even their sub-modules, are at risk of training data leakage, and we validate such vulnerabilities in various intermediate layers of language models. Our extensive experiments reveal that gradients from a single Transformer layer, or even a single linear component with 0.54% parameters, are susceptible to training data leakage. Additionally, we show that applying differential privacy on gradients during training offers limited protection against the novel vulnerability of data disclosure.
Original languageEnglish
Title of host publicationProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Place of PublicationKerrville, TX
PublisherAssociation for Computational Linguistics
Pages4786-4798
Number of pages13
ISBN (Electronic)9798891761643
DOIs
Publication statusPublished - 2024
Event2024 Conference on Empirical Methods in Natural Language Processing - Miami, United States
Duration: 12 Nov 202416 Nov 2024

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2024
Country/TerritoryUnited States
CityMiami
Period12/11/2416/11/24

Cite this