Abstract
Background: The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes.
Objective: The aim of this study is to develop automated methods that enable access to FH data through natural language processing.
Methods: We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems.
Results: Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%.
Conclusions: Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.
| Original language | English |
|---|---|
| Article number | e24020 |
| Number of pages | 21 |
| Journal | JMIR Medical Informatics |
| Volume | 9 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - 30 Apr 2021 |
Bibliographical note
Copyright the Author(s) 2021. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Correction: Rybinski M, Dai X, Singh S, Karimi S, Nguyen A, Correction: Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis, JMIR Med Inform 2021;9(5):e30153
URL: https://medinform.jmir.org/2021/5/e30153
DOI: 10.2196/30153
Keywords
- Information extraction
- Natural language processing
- Clinical natural language processing
- Named entity recognition
- Sequence tagging
- Neural language modeling
- Data augmentation
Fingerprint
Dive into the research topics of 'Extracting family history information from electronic health records: natural language processing analysis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver