Hiding sensitive information in eHealth datasets

Jimmy Ming-Tai Wu, Gautam Srivastava, Alireza Jolfaei, Philippe Fournier-Viger, Jerry Chun-Wei Lin*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

25 Citations (Scopus)


Privacy in the realm of data mining known as PPDM has become a hot topic in both academic research and industry due to the fact it can discover implicit rules as well as hide sensitive information for data sanitization. Many different algorithms and heuristics have been investigated to hide sensitive information using the act of transaction deletion based on evolutionary computation techniques, but to date, these algorithms only consider a uniform threshold value for sanitization progress. This technique is not applicable in real-world situations, especially for eHealth based medical datasets. For example, a patient can still be identified if he/she has more confidential information (i.e., symptoms) that cause privacy threats and security leakage in medical applications. In this work, we investigate a unique novel methodology to set varied threshold values that lead to varied lengths of sensitive patterns within a Genetic Algorithm (GA)-based framework. As the pattern length increases, a tighter threshold manifests to provide better protection of sensitive information that can avoid individual patients to be identified in eHealth datasets. Two GA-based models are developed for data sanitization using record deletion techniques. The experimental results are conducted and compared with the traditional Evolutionary Computation (EC)-based PPDM approaches and the results showed that the designed methods offer greater protection than previous methods in terms of side effects. Therefore, the designed models are effective to hide sensitive information in medical situations that can be used in real-world scenarios.

Original languageEnglish
Pages (from-to)169-180
Number of pages12
JournalFuture Generation Computer Systems
Early online date3 Dec 2020
Publication statusPublished - Apr 2021


  • Privacy
  • Preserving
  • Data mining
  • eHealth
  • Dynamic threshold
  • Sensitive
  • Evolutionary computation


Dive into the research topics of 'Hiding sensitive information in eHealth datasets'. Together they form a unique fingerprint.

Cite this