A hybrid approach for scalable sub-tree anonymization over big data using Map Reduce on cloud

Xuyun Zhang*, Chang Liu, Surya Nepal, Chi Yang, Wanchun Dou, Jinjun Chen

*Corresponding author for this work

Research output: Contribution to journalArticle

72 Citations (Scopus)
3 Downloads (Pure)

Abstract

In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation resources provisioned by public cloud services. Sub-tree data anonymization is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data in cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce algorithms for the two components (TDS and BUG) to gain high scalability. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches. (c) 2014 Elsevier Inc. All rights reserved.

Original languageEnglish
Pages (from-to)1008-1020
Number of pages13
JournalJournal of Computer and System Sciences
Volume80
Issue number5
DOIs
Publication statusPublished - Aug 2014
Externally publishedYes

Keywords

  • Big data
  • Cloud computing
  • Data anonymization
  • Privacy preservation
  • MapReduce

Cite this