Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews

George Karystianis*, Kristina Thayer, Mary Wolfe, Guy Tsafnat

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)


Introduction: Most data extraction efforts in epidemiology are focused on obtaining targeted information from clinical trials. In contrast, limited research has been conducted on the identification of information from observational studies, a major source for human evidence in many fields, including environmental health. The recognition of key epidemiological information (e.g., exposures) through text mining techniques can assist in the automation of systematic reviews and other evidence summaries.

Method: We designed and applied a knowledge-driven, rule-based approach to identify targeted information (study design, participant population, exposure, outcome, confounding factors, and the country where the study was conducted) from abstracts of epidemiological studies included in several systematic reviews of environmental health exposures. The rules were based on common syntactical patterns observed in text and are thus not specific to any systematic review. To validate the general applicability of our approach, we compared the data extracted using our approach versus hand curation for 35 epidemiological study abstracts manually selected for inclusion in two systematic reviews.

Results: The returned F-score, precision, and recall ranged from 70% to 98%, 81% to 100%, and 54% to 97%, respectively. The highest precision was observed for exposure, outcome and population (100%) while recall was best for exposure and study design with 97% and 89%, respectively. The lowest recall was observed for the population (54%), which also had the lowest F-score (70%).

Conclusion: The generated performance of our text-mining approach demonstrated encouraging results for the identification of targeted information from observational epidemiological study abstracts related to environmental exposures. We have demonstrated that rules based on generic syntactic patterns in one corpus can be applied to other observational study design by simple interchanging the dictionaries aiming to identify certain characteristics (i.e., outcomes, exposures). At the document level, the recognised information can assist in the selection and categorization of studies included in a systematic review.

Original languageEnglish
Pages (from-to)27-34
Number of pages8
JournalJournal of Biomedical Informatics
Publication statusPublished - 1 Jun 2017


  • Automation of systematic reviews
  • Dictionaries
  • Environmental health studies
  • Rule-based modelling
  • Text mining


Dive into the research topics of 'Evaluation of a rule-based method for epidemiological document classification towards the automation of systematic reviews'. Together they form a unique fingerprint.

Cite this