Measuring re-identification risk

CJ Carey, Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Shankar Kumar, Andres Muñoz Medina, Vahab Mirrokni, Gabriel Henrique Nunes, Sergei Vassilvitskii, Peilin Zhong

Research output: Contribution to journalConference paperpeer-review

6 Downloads (Pure)


Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.
Original languageEnglish
Article number149
Pages (from-to)1-26
Number of pages26
JournalProceedings of the ACM on Management of Data
Issue number2
Publication statusPublished - Jun 2023
Externally publishedYes
Event2023 ACM SIGMOD/PODS Conference - Seattle, United States
Duration: 18 Jun 202323 Jun 2023

Bibliographical note

Copyright the Author(s) 2023. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.


  • Re-identification risk
  • privacy
  • user representations


Dive into the research topics of 'Measuring re-identification risk'. Together they form a unique fingerprint.

Cite this