TY - GEN
T1 - Fairness-aware Privacy-Preserving Record Linkage
AU - Vatsalan, Dinusha
AU - Wang, Joyce
AU - Henecka, Wilko
AU - Thorne, Brian
PY - 2020
Y1 - 2020
N2 - Record linkage aims to identify records from different databases that correspond to the same real-world entity, while Privacy-Preserving Record Linkage (PPRL) conducts the linkage in a privacy-preserving context where private and sensitive information about individuals is not compromised. Linking records is considered as a classification task where pairs of records from different databases are classified into matches (i.e. they refer to the same entity) or non-matches (i.e. they refer to different entities). Due to the absence of unique entity identifiers across databases, commonly available quasi-identifiers (QIDs), such as name, gender, address, and date of birth, are used to determine the linkage. The values in such QIDs are often prone to data errors and variations making the linkage task challenging.
Fairness in classification is an emerging concept that determines how much a classifier distorts from producing correct predictions with equal probabilities for individuals across different protected groups based on sensitive features (e.g. gender or race). Developing classifiers that are fair with respect to such sensitive features is an important problem for classification in general and specifically for PPRL to mitigate the bias against sensitive and/or minority groups, for example against female group due to higher likelihood of variations in the QIDs such as last name and address. While there have been increased interest in this field, fairness specifically in PPRL research has not been studied in the literature so far. Fairness for PPRL brings in specific challenges and requirements.
In this paper, we study fairness for PPRL classifiers, analyse appropriate fairness criteria/metric for PPRL, study different forms of fairness-bias for PPRL and investigate the effectiveness of using fairness-aware PPRL. Our experimental results conducted on real and synthetically biased datasets show the efficacy and significance of incorporating fairness constraints in the linkage, leading to higher linkage quality in terms of both correctness and fairness.
AB - Record linkage aims to identify records from different databases that correspond to the same real-world entity, while Privacy-Preserving Record Linkage (PPRL) conducts the linkage in a privacy-preserving context where private and sensitive information about individuals is not compromised. Linking records is considered as a classification task where pairs of records from different databases are classified into matches (i.e. they refer to the same entity) or non-matches (i.e. they refer to different entities). Due to the absence of unique entity identifiers across databases, commonly available quasi-identifiers (QIDs), such as name, gender, address, and date of birth, are used to determine the linkage. The values in such QIDs are often prone to data errors and variations making the linkage task challenging.
Fairness in classification is an emerging concept that determines how much a classifier distorts from producing correct predictions with equal probabilities for individuals across different protected groups based on sensitive features (e.g. gender or race). Developing classifiers that are fair with respect to such sensitive features is an important problem for classification in general and specifically for PPRL to mitigate the bias against sensitive and/or minority groups, for example against female group due to higher likelihood of variations in the QIDs such as last name and address. While there have been increased interest in this field, fairness specifically in PPRL research has not been studied in the literature so far. Fairness for PPRL brings in specific challenges and requirements.
In this paper, we study fairness for PPRL classifiers, analyse appropriate fairness criteria/metric for PPRL, study different forms of fairness-bias for PPRL and investigate the effectiveness of using fairness-aware PPRL. Our experimental results conducted on real and synthetically biased datasets show the efficacy and significance of incorporating fairness constraints in the linkage, leading to higher linkage quality in terms of both correctness and fairness.
KW - Classification
KW - Correctness
KW - Entity resolution
KW - Fairness
KW - Privacy
UR - https://www.scopus.com/pages/publications/85101828446
U2 - 10.1007/978-3-030-66172-4_1
DO - 10.1007/978-3-030-66172-4_1
M3 - Conference proceeding contribution
SN - 9783030661717
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 18
BT - Data Privacy Management, Cryptocurrencies and Blockchain Technology
A2 - Garcia-Alfaro, Joaquin
A2 - Navarro-Arribas, Guillermo
A2 - Herrera-Joancomarti, Jordi
PB - Springer, Springer Nature
CY - Cham, Switzerland
T2 - 15th Data Privacy Managmeent International Workshop (DPM 2020)
Y2 - 17 September 2020 through 18 September 2020
ER -