TY - GEN
T1 - How much is too much? Leveraging ads audience estimation to evaluate public profile uniqueness
AU - Chen, Terence
AU - Chaabane, Abdelberi
AU - Tournoux, Pierre Ugo
AU - Kaafar, Mohamed Ali
AU - Boreli, Roksana
PY - 2013
Y1 - 2013
N2 - This paper addresses the important goal of quantifying the threat of linking external records to public Online Social Networks (OSN) user profiles, by providing a method to estimate the uniqueness of such profiles and by studying the amount of information carried by public profile attributes. Our first contribution is to leverage the Ads audience estimation platform of a major OSN to compute the information surprisal (IS) based uniqueness of public profiles, independently from the used profiles dataset. Then, we measure the quantity of information carried by the revealed attributes and evaluate the impact of the public release of selected combinations of these attributes on the potential to identify user profiles. Our measurement results, based on an unbiased sample of more than 400 thousand Facebook public profiles, show that, when disclosed in such profiles, current city has the highest individual attribute potential for unique identification and the combination of gender, current city and age can identify close to 55% of users to within a group of 20 and uniquely identify around 18% of users. We envisage the use of our methodology to assist both OSNs in designing better anonymization strategies when releasing user records and users to evaluate the potential for external parties to uniquely identify their public profiles and hence make it easier to link them with other data sources.
AB - This paper addresses the important goal of quantifying the threat of linking external records to public Online Social Networks (OSN) user profiles, by providing a method to estimate the uniqueness of such profiles and by studying the amount of information carried by public profile attributes. Our first contribution is to leverage the Ads audience estimation platform of a major OSN to compute the information surprisal (IS) based uniqueness of public profiles, independently from the used profiles dataset. Then, we measure the quantity of information carried by the revealed attributes and evaluate the impact of the public release of selected combinations of these attributes on the potential to identify user profiles. Our measurement results, based on an unbiased sample of more than 400 thousand Facebook public profiles, show that, when disclosed in such profiles, current city has the highest individual attribute potential for unique identification and the combination of gender, current city and age can identify close to 55% of users to within a group of 20 and uniquely identify around 18% of users. We envisage the use of our methodology to assist both OSNs in designing better anonymization strategies when releasing user records and users to evaluate the potential for external parties to uniquely identify their public profiles and hence make it easier to link them with other data sources.
UR - http://www.scopus.com/inward/record.url?scp=84884940898&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-39077-7_12
DO - 10.1007/978-3-642-39077-7_12
M3 - Conference proceeding contribution
AN - SCOPUS:84884940898
SN - 9783642390760
T3 - Lecture Notes in Computer Science
SP - 225
EP - 244
BT - Privacy Enhancing Technologies
A2 - De Cristofaro, Emiliano
A2 - Wright, Matthew
PB - Springer, Springer Nature
CY - Berlin
T2 - 13th International Symposium on Privacy Enhancing Technologies, PETS 2013
Y2 - 10 July 2013 through 12 July 2013
ER -