MRMondrian: scalable multidimensional anonymisation for big data privacy preservation

Xuyun Zhang*, Lianyong Qi*, Wanchun Dou, Qiang He, Christopher Leckie, Kotagiri Ramamohanarao, Zoran Salcic

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Scalable data processing platforms built on cloud computing becomes increasingly attractive as infrastructure for supporting big data applications. But privacy concerns are one of the major obstacles to making use of public cloud platforms. Multidimensional anonymisation, a global-recoding generalisation scheme for privacy-preserving data publishing, has been a recent focus due to its capability of balancing data obfuscation and usability. Existing multidimensional anonymisation methods suffer from scalability problems when handling big data due to the impractical serial I/O cost. Given the recursive feature of multidimensional anonymisation, parallelisation is an ideal solution to scalability issues. However, it is still a challenge to use existing distributed and parallel paradigms directly for recursive computation. In this paper, we propose a scalable approach for big data multidimensional anonymisation based on MapReduce, a state-of-the-art data processing paradigm. Our basic idea is to partition a data set recursively into smaller partitions using MapReduce until all partitions can fit in the memory of a computing node. A tree indexing structure is proposed to achieve recursive computation. Moreover, we show the applicability of our approach to differential privacy. Experimental results on real-life data demonstrate that our approach can significantly improve the scalability of multidimensional anonymisation over existing methods.
Original languageEnglish
Pages (from-to)125-139
Number of pages15
JournalIEEE Transactions on Big Data
Volume8
Issue number1
Early online date27 Dec 2017
DOIs
Publication statusPublished - Feb 2022
Externally publishedYes

Keywords

  • Data privacy
  • Privacy
  • Scalability
  • Cloud computing
  • Partitioning algorithms
  • Indexing
  • Big data
  • Privacy preservation
  • MapReduce
  • Data anonymisation
  • Differential privacy

Fingerprint

Dive into the research topics of 'MRMondrian: scalable multidimensional anonymisation for big data privacy preservation'. Together they form a unique fingerprint.

Cite this