Project Details
Description
Data analysis methods using machine learning (ML) can unlock valuable insights for improving revenue or quality-of-service from, potentially proprietary, private datasets. Due to the nature of learning, having large high-quality datasets improves the quality of the trained ML models in terms of the accuracy of predictions on potentially untested data. The subsequent improvements in quality motivate multiple data
owners to share and merge their datasets in order to create larger training datasets for federal training. For instance, financial institutes may wish to merge their transaction or lending datasets to improve the quality of trained ML models for fraud detection or computing interest rates.
However, data owners are independent of each other and may have concerns about their own data safety in federated learning. Also, government regulations (e.g., the roll-out of the General Data Protection Regulation in EU, the California Consumer Privacy Act or the development of the Data Sharing and Release Bill in Australia) increasingly prohibit sharing customer’s data without consent. There is a
strong need to conciliate the tension between quality improvement of trained ML models and the privacy concerns for data sharing.
We plan to first quantify amounts of both privacy and accuracy by predicting the performance of the ML training model. The major contribution of the proposed research is then the implementation of game theory to provide optimal trade-offs of the conflict between accuracy and privacy level in federated privacy-aware
data analysis.
owners to share and merge their datasets in order to create larger training datasets for federal training. For instance, financial institutes may wish to merge their transaction or lending datasets to improve the quality of trained ML models for fraud detection or computing interest rates.
However, data owners are independent of each other and may have concerns about their own data safety in federated learning. Also, government regulations (e.g., the roll-out of the General Data Protection Regulation in EU, the California Consumer Privacy Act or the development of the Data Sharing and Release Bill in Australia) increasingly prohibit sharing customer’s data without consent. There is a
strong need to conciliate the tension between quality improvement of trained ML models and the privacy concerns for data sharing.
We plan to first quantify amounts of both privacy and accuracy by predicting the performance of the ML training model. The major contribution of the proposed research is then the implementation of game theory to provide optimal trade-offs of the conflict between accuracy and privacy level in federated privacy-aware
data analysis.
Short title | Student scholarship- Wu |
---|---|
Status | Finished |
Effective start/end date | 4/01/20 → 1/10/22 |