Scalable iterative implementation of mondrian for big data multidimensional anonymisation

Xuyun Zhang*, Lianyong Qi, Qiang He, Wanchun Dou

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

1 Citation (Scopus)

Abstract

Scalable data processing platforms built on cloud computing are becoming increasingly attractive as infrastructure for supporting big data mining and analytics applications. But privacy concerns are one of the major obstacles to make use of public cloud platforms. Practically, data generalisation is a widely adopted anonymisation technique for data privacy preservation in data publishing or sharing scenarios. Multidimensional anonymisation, a global-recoding generalisation scheme, has been a recent focus due to its capability of balancing data obfuscation and data usability. Existing approaches handled the scalability problem of multidimensional anonymisation for data sets much larger than main memory by storing data on disk at runtime, which incurs an impractical serial I/O cost. In this paper, we propose a scalable iterative multidimensional anonymisation approach for big data sets based on MapReduce, a state-of-the-art large-scale data processing paradigm. Our basic and intuitive idea is to partition a large data set recursively into smaller data partitions using MapReduce until all partitions can fit in memory of each computing node. A tree indexing structure is proposed to achieve recursive computation on MapReduce for data partitioning in multidimensional anonymisation. Experimental results on real-life data sets demonstrate that the proposed approach can significantly improve the scalability and time-efficiency of multidimensional anonymisation over existing approaches, and therefore is applicable to big data applications.

Original languageEnglish
Title of host publicationSecurity, Privacy and Anonymity in Computation, Communication and Storage
Subtitle of host publicationSpaCCS 2016 International Workshops TrustData, TSP, NOPE, DependSys, BigDataSPT, and WCSSC Zhangjiajie, China, November 16–18, 2016 Proceedings
EditorsGuojun Wang, Indrakshi Ray, Jose M. Alcaraz Calero, Sabu M. Thampi
Place of PublicationCham, Switzerland
PublisherSpringer, Springer Nature
Pages311-320
Number of pages10
ISBN (Print)9783319491448
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event9th International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, SpaCCS 2016 - Zhangjiajie, China
Duration: 16 Nov 201618 Nov 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10067 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, SpaCCS 2016
Country/TerritoryChina
CityZhangjiajie
Period16/11/1618/11/16

Keywords

  • Big data
  • Cloud computing
  • Data anonymisation
  • MapReduce
  • Privacy preservation

Fingerprint

Dive into the research topics of 'Scalable iterative implementation of mondrian for big data multidimensional anonymisation'. Together they form a unique fingerprint.

Cite this