Scalable local-recoding anonymization using locality sensitive hashing for big data privacy preservation

Xuyun Zhang, Christopher Leckie, Wanchun Dou, Jinjun Chen, Ramamohanarao Kotagiri, Zoran Salcic

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

7 Citations (Scopus)

Abstract

While cloud computing has become an attractive platform for supporting data intensive applications, a major obstacle to the adoption of cloud computing in sectors such as health and defense is the privacy risk associated with releasing datasets to third-parties in the cloud for analysis. A widely-adopted technique for data privacy preservation is to anonymize data via local recoding. However, most existing local-recoding techniques are either serial or distributed without directly optimizing scalability, thus rendering them unsuitable for big data applications. In this paper, we propose a highly scalable approach to local-recoding anonymization in cloud computing, based on Locality Sensitive Hashing (LSH). Specifically, a novel semantic distance metric is presented for use with LSH to measure the similarity between two data records. Then, LSH with the Min-Hash function family can be employed to divide datasets into multiple partitions for use with MapReduce to parallelize computation while preserving similarity. By using our efficient LSH-based scheme, we can anonymize each partition through the use of a recursive agglomerative k-member clustering algorithm. Extensive experiments on real-life datasets show that our approach significantly improves the scalability and time-efficiency of local-recoding anonymization by orders of magnitude over existing approaches.

Original languageEnglish
Title of host publicationProceedings of the 2016 ACM Conference on Information and Knowledge Management (CIKM 2016)
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages1793-1802
Number of pages10
ISBN (Electronic)9781450340731
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event25th ACM International Conference on Information and Knowledge Management, CIKM 2016 - Indianapolis, United States
Duration: 24 Oct 201628 Oct 2016

Conference

Conference25th ACM International Conference on Information and Knowledge Management, CIKM 2016
CountryUnited States
CityIndianapolis
Period24/10/1628/10/16

Keywords

  • Big data
  • Cloud
  • LSH
  • MapReduce
  • Privacy preservation

Fingerprint Dive into the research topics of 'Scalable local-recoding anonymization using locality sensitive hashing for big data privacy preservation'. Together they form a unique fingerprint.

Cite this