Biomarker information extraction tool (BIET) development using natural language processing and machine learning

Md Tawhidul Islam, M. Shaikh, A. Nayak, S. Ranganathan

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

7 Citations (Scopus)

Abstract

In recent years, there has been a rising interest in extracting entities and relations from biomedical literatures. A vast number of systems and approaches have been proposed to extract biological relations but none of them achieves satisfactory results due to the failure of handling the grammatical complexities and subtle features of biomedical texts. In this paper, we detail an approach to a very specific task of information extraction namely, extracting biomarker information in biomedical literature. Starting with the abstract of a given publication, we first identify the evaluative sentence(s) among other sentences by recognizing words and phrases in the text belonging to semantic categories of interest to bio-medical entities (semantic category recognition). For the entities like, protein, gene and disease, we determine whether the statement refers to biomarker relationship (assertion classification). Finally, we identify the biomarker relationship among the bio-medical entities (semantic relationship classification). Our approach utilizes a series of statistical models that rely heavily on local lexical and syntactic context and achieve competitive results compared to more complex NLP solutions. We conclude the paper by presenting the design of a system namely, the Biomarker Information Extraction Tool (BIET). BIET combines our solutions to semantic category recognition, assertion classification and semantic relationship classification into a single application that facilitates the easy extraction of semantic information from medical text. We designed and implemented ML-based BIET system for biomarker extraction, using support vector machines and trained and tested it on a corpus of oncology related PubMed/MEDLINE literatures hand-annotated with biomarker information. Several tests are performed to assess the performance of the system's component namely semantic category recognizer, assertion classifier and semantic relationship classifier and the system achieves an average F-score of 86% for the task of biomarker information extraction comparing to the human annotated dataset (i.e. gold standard) scores.

Original languageEnglish
Title of host publicationICWET 2010 - International Conference and Workshop on Emerging Trends in Technology 2010, Conference Proceedings
Place of PublicationNew York
PublisherACM
Pages121-126
Number of pages6
ISBN (Print)9781605588124
DOIs
Publication statusPublished - 2010
EventInternational Conference and Workshop on Emerging Trends in Technology 2010, ICWET 2010 - Mumbai, Maharashtra, India
Duration: 26 Feb 201027 Feb 2010

Other

OtherInternational Conference and Workshop on Emerging Trends in Technology 2010, ICWET 2010
CountryIndia
CityMumbai, Maharashtra
Period26/02/1027/02/10

Fingerprint Dive into the research topics of 'Biomarker information extraction tool (BIET) development using natural language processing and machine learning'. Together they form a unique fingerprint.

  • Cite this

    Islam, M. T., Shaikh, M., Nayak, A., & Ranganathan, S. (2010). Biomarker information extraction tool (BIET) development using natural language processing and machine learning. In ICWET 2010 - International Conference and Workshop on Emerging Trends in Technology 2010, Conference Proceedings (pp. 121-126). New York: ACM. https://doi.org/10.1145/1741906.1741927