Abstract
The massive increase in computing power and data storage capacity provisioned by cloud computing as well as advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals by orders of magnitude. Meanwhile, privacy protection is one of most concerned issues in big data and cloud applications, thereby requiring strong preservation of customer privacy and attracting considerable attention from both IT industry and academia. Data anonymization provides an effective way for data privacy preservation, and multidimensional anonymization scheme is a widely-adopted one among existing anonymization schemes. However, existing multidimensional anonymization approaches suffer from severe scalability or IT cost issues when handling big data due to their incapability of fully leveraging cloud resources or being cost-effectively adapted to cloud environments. As such, we propose a scalable multidimensional anonymization approach for big data privacy preservation using MapReduce on cloud. In the approach, a highly scalable median-finding algorithm combining the idea of the median of medians and histogram technique is proposed and the recursion granularity is controlled to achieve cost-effectiveness. Corresponding MapReduce jobs are dedicatedly designed, and the experiment evaluations demonstrate that with our approach, the scalability and cost-effectiveness of multidimensional scheme can be improved significantly over existing approaches.
Original language | English |
---|---|
Title of host publication | 2013 IEEE Third International Conference on Cloud and Green Computing |
Place of Publication | Los Alamitos, CA |
Publisher | Springer, Springer Nature |
Pages | 105-112 |
Number of pages | 8 |
ISBN (Electronic) | 9780769551142 |
DOIs | |
Publication status | Published - 2013 |
Externally published | Yes |
Event | 3rd IEEE International Conference on Cloud and Green Computing (CGC) - Karlsruhe, Germany Duration: 30 Sept 2013 → 2 Oct 2013 |
Conference
Conference | 3rd IEEE International Conference on Cloud and Green Computing (CGC) |
---|---|
Country/Territory | Germany |
City | Karlsruhe |
Period | 30/09/13 → 2/10/13 |
Keywords
- big data
- cloud computing
- privacy preservation
- MapReduce
- multidimensional anonymization