Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata

Mário S. Alvim, Natasha Fernandes, Annabelle McIver, Carroll Morgan, Gabriel H. Nunes

Research output: Contribution to journalArticlepeer-review

63 Downloads (Pure)

Abstract

We present a systematic refactoring of the conventional treatment of privacy analyses, basing it on mathematical concepts from the framework of Quantitative Information Flow (QIF ). The approach we suggest brings three principal advantages: it is flexible, allowing for precise quantification and comparison of privacy risks for attacks both known and novel; it can be computationally tractable for very large, longitudinal datasets; and its results are explainable both to politicians and to the general public. We apply our approach to a very large case study: the Educational Censuses of Brazil, curated by the governmental agency inep, which comprise over 90 attributes of approximately 50 million individuals released longitudinally every year since 2007. These datasets have only very recently (2018–2021) attracted legislation to regulate their privacy — while at the same time continuing to maintain the openness that had been sought in Brazilian society. inep’s reaction to that legislation was the genesis of our project with them. In our conclusions here we share the scientific, technical, and communication lessons we learned in the process.
Original languageEnglish
Pages (from-to)378-399
Number of pages22
JournalProceedings on Privacy Enhancing Technologies
Volume2022
Issue number4
DOIs
Publication statusPublished - 2022
EventThe 22nd Privacy Enhancing Technologies Symposium - Macquarie University, Sydney, Australia
Duration: 11 Jul 202215 Jul 2022
Conference number: 2022
https://petsymposium.org/2022/

Bibliographical note

Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • privacy
  • formal methods
  • quantitative information flow
  • very large datasets
  • longitudinal datasets

Fingerprint

Dive into the research topics of 'Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata'. Together they form a unique fingerprint.

Cite this