Skip to main navigation Skip to search Skip to main content

Extracting family history information from electronic health records: natural language processing analysis

Maciej Rybinski*, Xiang Dai, Sonit Singh, Sarvnaz Karimi, Anthony Nguyen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Downloads (Pure)

Abstract

Background: The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. 

Objective: The aim of this study is to develop automated methods that enable access to FH data through natural language processing. 

Methods: We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. 

Results: Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. 

Conclusions: Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

Original languageEnglish
Article numbere24020
Number of pages21
JournalJMIR Medical Informatics
Volume9
Issue number4
DOIs
Publication statusPublished - 30 Apr 2021

Bibliographical note

Copyright the Author(s) 2021. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Correction: Rybinski M, Dai X, Singh S, Karimi S, Nguyen A, Correction: Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis, JMIR Med Inform 2021;9(5):e30153
URL: https://medinform.jmir.org/2021/5/e30153
DOI: 10.2196/30153

Keywords

  • Information extraction
  • Natural language processing
  • Clinical natural language processing
  • Named entity recognition
  • Sequence tagging
  • Neural language modeling
  • Data augmentation

Fingerprint

Dive into the research topics of 'Extracting family history information from electronic health records: natural language processing analysis'. Together they form a unique fingerprint.

Cite this