Privacy-preserving record linkage for big data: current approaches and research challenges

Dinusha Vatsalan, Ziad Sehili, Peter Christen, Erhard Rahm

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

82 Citations (Scopus)


The growth of Big Data, especially personal data dispersed in multiple data sources, presents enormous opportunities and insights for businesses to explore and leverage the value of linked and integrated data. However, privacy concerns impede sharing or exchanging data for linkage across different organizations. Privacy-preserving record linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these entities. PPRL is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. PPRL for Big Data poses several challenges, with the three major ones being (1) scalability to multiple large databases, due to their massive volume and the flow of data within Big Data applications, (2) achieving high quality results of the linkage in the presence of variety and veracity of Big Data, and (3) preserving privacy and confidentiality of the entities represented in Big Data collections. In this chapter, we describe the challenges of PPRL in the context of Big Data, survey existing techniques for PPRL, and provide directions for future research.
Original languageEnglish
Title of host publicationHandbook of Big Data Technologies
EditorsAlbert Y. Zomaya, Sherif Sakr
Place of PublicationCham, Switzerland
PublisherSpringer, Springer Nature
Number of pages45
ISBN (Electronic)9783319493404
ISBN (Print)9783319493398
Publication statusPublished - 2017
Externally publishedYes

Publication series

NameHandbook of big data technologies


  • Record linkage
  • Privacy
  • Big data
  • Scalability


Dive into the research topics of 'Privacy-preserving record linkage for big data: current approaches and research challenges'. Together they form a unique fingerprint.

Cite this