Clinical information extraction using word representations

Shervin Malmasi, Hamed Hassanzadeh, Mark Dras

Research output: Contribution to journalConference paperpeer-review


A central task in clinical information extraction is the classification of sentences to identify key information in publications, such as intervention and outcomes. Surface tokens and part-of-speech tags have been the most commonly used feature types for this task. In this paper we evaluate the use of word representations, induced from approximately 100m tokens of unlabelled in-domain data, as a form of semi-supervised learning for this task. We take an approach based on unsupervised word clusters, using the Brown clustering algorithm, with results showing that this method outperforms the standard features. We inspect the induced word representations and the resulting discriminative model features to gain further insights about this approach.
Original languageEnglish
Pages (from-to)66-74
Number of pages9
JournalALTA 2015 : Proceedings of Australasian Language Technology Association Workshop 2015
Publication statusPublished - 2015
EventAustralasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW
Duration: 8 Dec 20159 Dec 2015


Dive into the research topics of 'Clinical information extraction using word representations'. Together they form a unique fingerprint.

Cite this