TY - GEN
T1 - The value of collaboration in convex machine learning with differential privacy
AU - Wu, Nan
AU - Farokhi, Farhad
AU - Smith, David
AU - Kaafar, Mohamed Ali
PY - 2020
Y1 - 2020
N2 - In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.
AB - In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.
KW - Differential privacy
KW - Machine learning
KW - Stochastic gradient algorithm
UR - http://www.scopus.com/inward/record.url?scp=85084279655&partnerID=8YFLogxK
U2 - 10.1109/SP40000.2020.00025
DO - 10.1109/SP40000.2020.00025
M3 - Conference proceeding contribution
AN - SCOPUS:85084279655
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 304
EP - 317
BT - Proceedings - 2020 IEEE Symposium on Security and Privacy, SP 2020
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Los Alamitos, CA
T2 - 41st IEEE Symposium on Security and Privacy, SP 2020
Y2 - 18 May 2020 through 21 May 2020
ER -