TY - JOUR
T1 - Measuring re-identification risk
AU - Carey, CJ
AU - Dick, Travis
AU - Epasto, Alessandro
AU - Javanmard, Adel
AU - Karlin, Josh
AU - Kumar, Shankar
AU - Medina, Andres Muñoz
AU - Mirrokni, Vahab
AU - Nunes, Gabriel Henrique
AU - Vassilvitskii, Sergei
AU - Zhong, Peilin
N1 - Copyright the Author(s) 2023. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2023/6
Y1 - 2023/6
N2 - Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.
AB - Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.
KW - Re-identification risk
KW - privacy
KW - user representations
U2 - 10.1145/3589294
DO - 10.1145/3589294
M3 - Conference paper
SN - 2836-6573
VL - 1
SP - 1
EP - 26
JO - Proceedings of the ACM on Management of Data
JF - Proceedings of the ACM on Management of Data
IS - 2
M1 - 149
T2 - 2023 ACM SIGMOD/PODS Conference
Y2 - 18 June 2023 through 23 June 2023
ER -