Extractive summarisation of medical documents using domain knowledge and corpus statistics

Abeed Sarker*, Diego Mollá, Cecile Paris

*Corresponding author for this work

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Background Evidence Based Medicine (EBM) practice requires practitioners to extract evidence from published medical research when answering clinical queries. Due to the time- consuming nature of this practice, there is a strong motivation for systems that can automatically summarise medical documents and help practitioners find relevant information. Aim The aim of this work is to propose an automatic query- focused, extractive summarisation approach that selects informative sentences from medical documents. Method We use a corpus that is specifically designed for summarisation in the EBM domain. We use approximately half the corpus for deriving important statistics associated with the best possible extractive summaries. We take into account factors such as sentence position, length, sentence content, and the type of the query posed. Using the statistics from the first set, we evaluate our approach on a separate set. Evaluation of the qualities of the generated summaries is performed automatically using ROUGE, which is a popular tool for evaluating automatic summaries. Results Our summarisation approach outperforms all baselines (best baseline score: 0.1594; our score 0.1653). Further improvements are achieved when query types are taken into account.! Conclusion! The quality of extractive summarisation in the medical domain can be significantly improved by incorporating domain knowledge and statistics derived from a specialised corpus. Such techniques can therefore be applied for content selection in end-to-end summarisation systems.

Original languageEnglish
Pages (from-to)478-481
Number of pages4
JournalAustralasian Medical Journal
Volume5
Issue number9
DOIs
Publication statusPublished - 2012

Fingerprint Dive into the research topics of 'Extractive summarisation of medical documents using domain knowledge and corpus statistics'. Together they form a unique fingerprint.

  • Cite this