Abstract
While cloud computing has become an attractive platform for supporting data intensive applications, a major obstacle to the adoption of cloud computing in sectors such as health and defense is the privacy risk associated with releasing datasets to third-parties in the cloud for analysis. A widely-adopted technique for data privacy preservation is to anonymize data via local recoding. However, most existing local-recoding techniques are either serial or distributed without directly optimizing scalability, thus rendering them unsuitable for big data applications. In this paper, we propose a highly scalable approach to local-recoding anonymization in cloud computing, based on Locality Sensitive Hashing (LSH). Specifically, a novel semantic distance metric is presented for use with LSH to measure the similarity between two data records. Then, LSH with the Min-Hash function family can be employed to divide datasets into multiple partitions for use with MapReduce to parallelize computation while preserving similarity. By using our efficient LSH-based scheme, we can anonymize each partition through the use of a recursive agglomerative k-member clustering algorithm. Extensive experiments on real-life datasets show that our approach significantly improves the scalability and time-efficiency of local-recoding anonymization by orders of magnitude over existing approaches.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2016 ACM Conference on Information and Knowledge Management (CIKM 2016) |
Place of Publication | New York |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1793-1802 |
Number of pages | 10 |
ISBN (Electronic) | 9781450340731 |
DOIs | |
Publication status | Published - 2016 |
Externally published | Yes |
Event | 25th ACM International Conference on Information and Knowledge Management, CIKM 2016 - Indianapolis, United States Duration: 24 Oct 2016 → 28 Oct 2016 |
Conference
Conference | 25th ACM International Conference on Information and Knowledge Management, CIKM 2016 |
---|---|
Country | United States |
City | Indianapolis |
Period | 24/10/16 → 28/10/16 |
Keywords
- Big data
- Cloud
- LSH
- MapReduce
- Privacy preservation