Skip to main navigation Skip to search Skip to main content

Clustering-based scalable indexing for multi-party privacy-preserving record linkage

Thilina Ranbaduge*, Dinusha Vatsalan, Peter Christen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

The identification of common sets of records in multiple databases has become an increasingly important subject in many application areas, including banking, health, and national security. Often privacy concerns and regulations prevent the owners of the databases from sharing any sensitive details of their records with each other, and with any other party. The linkage of records in multiple databases while preserving privacy and confidentiality is an emerging research discipline known as privacy-preserving record linkage (PPRL). We propose a novel two-step indexing (blocking) approach for PPRL between multiple (more than two) parties. First, we generate small mini-blocks using a multi-bit Bloom filter splitting method and second we merge these mini-blocks based on their similarity using a novel hierarchical canopy clustering technique. An empirical study conducted with large datasets of up-to one million records shows that our approach is scalable with the size of the datasets and the number of parties, while providing better privacy than previous multi-party indexing approaches.
Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication19th Pacific-Asia Conference, PAKDD 2015 Ho Chi Minh City, Vietnam, May 19–22, 2015 Proceedings, Part II
EditorsTru Cao, Ee-Peng Lim, Zhi-Hua Zhou, Tu-Bao Ho, David Cheung, Hiroshi Motoda
Place of PublicationCham, Switzerland
PublisherSpringer, Springer Nature
Pages549-561
Number of pages13
ISBN (Electronic)9783319180328
ISBN (Print)9783319180311
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2015 - Ho Chi Minh City, Viet Nam
Duration: 19 May 201519 May 2015

Other

Other19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2015
Country/TerritoryViet Nam
CityHo Chi Minh City
Period19/05/1519/05/15

Keywords

  • Hierarchical canopy clustering
  • Bloom filters
  • Scalability

Fingerprint

Dive into the research topics of 'Clustering-based scalable indexing for multi-party privacy-preserving record linkage'. Together they form a unique fingerprint.

Cite this