Proximity-aware local-recoding anonymization with MapReduce for scalable big data privacy preservation in cloud

Xuyun Zhang*, Wanchun Dou, Jian Pei, Surya Nepal, Chi Yang, Chang Liu, Jinjun Chen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

75 Citations (Scopus)


Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacy-sensitive information, which brings about privacy concerns potentially if the information is released or shared to third-parties in cloud. A practical and widely-adopted technique for data privacy preservation is to anonymize data via generalization to satisfy a given privacy model. However, most existing privacy preserving approaches tailored to small-scale data sets often fall short when encountering big data, due to their insufficiency or poor scalability. In this paper, we investigate the local-recoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes, and model the problem of local recoding as a proximity-aware clustering problem. A scalable two-phase clustering approach consisting of a t-ancestors clustering (similar to k-means) algorithm and a proximity-aware agglomerative clustering algorithm is proposed to address the above problem. We design the algorithms with MapReduce to gain high scalability by performing data-parallel computation in cloud. Extensive experiments on real-life data sets demonstrate that our approach significantly improves the capability of defending the proximity privacy breaches, the scalability and the time-efficiency of local-recoding anonymization over existing approaches.

Original languageEnglish
Pages (from-to)2293-2307
Number of pages15
JournalIEEE Transactions on Computers
Issue number8
Publication statusPublished - Aug 2015
Externally publishedYes


  • Big data
  • cloud computing
  • mapreduce
  • data anonymization
  • proximity privacy
  • Cloud Computing
  • Proximity Privacy
  • Big Data
  • Data Anonymization
  • MapReduce


Dive into the research topics of 'Proximity-aware local-recoding anonymization with MapReduce for scalable big data privacy preservation in cloud'. Together they form a unique fingerprint.

Cite this