TY - JOUR
T1 - Privacy-preserving deep learning based record linkage
AU - Ranbaduge, Thilina
AU - Vatsalan, Dinusha
AU - Ding, Ming
N1 - Copyright the Author(s) 2023. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher
PY - 2024/11
Y1 - 2024/11
N2 - Deep learning
-based linkage of records across different databases is becoming
increasingly useful in data integration and mining applications to
discover new insights from multiple data sources. However, due to
privacy and confidentiality concerns, organisations often are unwilling
or allowed to share their sensitive data with any external parties, thus
making it challenging to build/train deep learning models for record
linkage across different organisations' databases. To overcome this
limitation, we propose the first deep learning-based multi-party
privacy-preserving record linkage (PPRL) protocol that can be used to
link sensitive databases held by multiple different organisations. In
our approach, each database owner first trains a local deep learning
model, which is then uploaded to a secure environment and securely
aggregated to create a global model. The global model is then used by a
linkage unit to distinguish unlabelled record pairs as matches and
non-matches. We utilise differential privacy to achieve provable privacy
protection against re-identification attacks. We evaluate the linkage
quality and scalability of our approach using several large real-world
databases, showing that it can achieve high linkage quality while
providing sufficient privacy protection against existing attacks.
AB - Deep learning
-based linkage of records across different databases is becoming
increasingly useful in data integration and mining applications to
discover new insights from multiple data sources. However, due to
privacy and confidentiality concerns, organisations often are unwilling
or allowed to share their sensitive data with any external parties, thus
making it challenging to build/train deep learning models for record
linkage across different organisations' databases. To overcome this
limitation, we propose the first deep learning-based multi-party
privacy-preserving record linkage (PPRL) protocol that can be used to
link sensitive databases held by multiple different organisations. In
our approach, each database owner first trains a local deep learning
model, which is then uploaded to a secure environment and securely
aggregated to create a global model. The global model is then used by a
linkage unit to distinguish unlabelled record pairs as matches and
non-matches. We utilise differential privacy to achieve provable privacy
protection against re-identification attacks. We evaluate the linkage
quality and scalability of our approach using several large real-world
databases, showing that it can achieve high linkage quality while
providing sufficient privacy protection against existing attacks.
UR - http://www.scopus.com/inward/record.url?scp=85180308262&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2023.3342757
DO - 10.1109/TKDE.2023.3342757
M3 - Article
AN - SCOPUS:85180308262
SN - 1041-4347
VL - 36
SP - 6839
EP - 6850
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 11
ER -