A MapReduce based approach of scalable multidimensional anonymization for big data privacy preservation on cloud

Xuyun Zhang*, Chi Yang, Surya Nepal, Chang Liu, Wanchun Dou, Jinjun Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

24 Citations (Scopus)


The massive increase in computing power and data storage capacity provisioned by cloud computing as well as advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals by orders of magnitude. Meanwhile, privacy protection is one of most concerned issues in big data and cloud applications, thereby requiring strong preservation of customer privacy and attracting considerable attention from both IT industry and academia. Data anonymization provides an effective way for data privacy preservation, and multidimensional anonymization scheme is a widely-adopted one among existing anonymization schemes. However, existing multidimensional anonymization approaches suffer from severe scalability or IT cost issues when handling big data due to their incapability of fully leveraging cloud resources or being cost-effectively adapted to cloud environments. As such, we propose a scalable multidimensional anonymization approach for big data privacy preservation using MapReduce on cloud. In the approach, a highly scalable median-finding algorithm combining the idea of the median of medians and histogram technique is proposed and the recursion granularity is controlled to achieve cost-effectiveness. Corresponding MapReduce jobs are dedicatedly designed, and the experiment evaluations demonstrate that with our approach, the scalability and cost-effectiveness of multidimensional scheme can be improved significantly over existing approaches.

Original languageEnglish
Title of host publication2013 IEEE Third International Conference on Cloud and Green Computing
Place of PublicationLos Alamitos, CA
PublisherSpringer, Springer Nature
Number of pages8
ISBN (Electronic)9780769551142
Publication statusPublished - 2013
Externally publishedYes
Event3rd IEEE International Conference on Cloud and Green Computing (CGC) - Karlsruhe, Germany
Duration: 30 Sep 20132 Oct 2013


Conference3rd IEEE International Conference on Cloud and Green Computing (CGC)


  • big data
  • cloud computing
  • privacy preservation
  • MapReduce
  • multidimensional anonymization

Cite this