MRMondrian

scalable multidimensional anonymisation for big data privacy preservation

Xuyun Zhang, Lianyong Qi, Wanchun Dou, Qiang He, Christopher Leckie, Kotagiri Ramamohanarao, Zoran Salcic

Research output: Contribution to journalArticle

Abstract

Scalable data processing platforms built on cloud computing becomes increasingly attractive as infrastructure for supporting big data applications. But privacy concerns are one of the major obstacles to making use of public cloud platforms. Multidimensional anonymisation, a global-recoding generalisation scheme for privacy-preserving data publishing, has been a recent focus due to its capability of balancing data obfuscation and usability. Existing multidimensional anonymisation methods suffer from scalability problems when handling big data due to the impractical serial I/O cost. Given the recursive feature of multidimensional anonymisation, parallelisation is an ideal solution to scalability issues. However, it is still a challenge to use existing distributed and parallel paradigms directly for recursive computation. In this paper, we propose a scalable approach for big data multidimensional anonymisation based on MapReduce, a state-of-the-art data processing paradigm. Our basic idea is to partition a data set recursively into smaller partitions using MapReduce until all partitions can fit in the memory of a computing node. A tree indexing structure is proposed to achieve recursive computation. Moreover, we show the applicability of our approach to differential privacy. Experimental results on real-life data demonstrate that our approach can significantly improve the scalability of multidimensional anonymisation over existing methods.
Original languageEnglish
JournalIEEE Transactions on Big Data
DOIs
Publication statusE-pub ahead of print - 27 Dec 2017
Externally publishedYes

Keywords

  • Data privacy
  • Big Data
  • Privacy
  • Scalability
  • Cloud computing
  • Partitioning algorithms
  • Indexing
  • Big data
  • Privacy preservation
  • MapReduce
  • Data anonymisation
  • Differential privacy

Fingerprint Dive into the research topics of 'MRMondrian: scalable multidimensional anonymisation for big data privacy preservation'. Together they form a unique fingerprint.

Cite this