TY - GEN
T1 - Extracting biomarker information applying natural language processing and machine learning
AU - Islam, Md Tawhidul
AU - Shaikh, Mostafa
AU - Nayak, Abhaya
AU - Ranganathan, Shoba
N1 - Copyright 2010 IEEE. Reprinted from 2010 4th international conference on bioinformatics and biomedical engineering (iCBBE) : June 18-20, 2010 Chengdu, China. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Macquarie University’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
PY - 2010
Y1 - 2010
N2 - In this paper, we detail an approach to a very specific task of information extraction namely, extracting biomarker information in biomedical literature. Starting with the abstract of a given publication, we first identify the evaluative sentence(s) among other sentences by recognizing words and phrases in the text belonging to semantic categories of interest to bio-medical entities (i.e., semantic category recognition). For the entities like, protein, gene and disease, we determine whether the statement refers to biomarker relationship (i.e., assertion classification). Finally, we identify the biomarker relationship among the biomedical entities (i.e., semantic relationship classification). The system, Biomarker Information Extraction Tool (BIET) implements Machine Learning-based biomarker extraction using support vector machines (SVM). The system is trained and tested on a corpus of oncology related PubMed/M EDLINE literatures hand-annotated with biomarker information. We investigate the effectiveness of different features for this task and examine the amount of training data needed to learn the biomarker relationship with the entities. Our system achieved an average Fscore of 86% for the task of biomarker information extraction comparing to the human annotated dataset (i.e. gold standard) scores.
AB - In this paper, we detail an approach to a very specific task of information extraction namely, extracting biomarker information in biomedical literature. Starting with the abstract of a given publication, we first identify the evaluative sentence(s) among other sentences by recognizing words and phrases in the text belonging to semantic categories of interest to bio-medical entities (i.e., semantic category recognition). For the entities like, protein, gene and disease, we determine whether the statement refers to biomarker relationship (i.e., assertion classification). Finally, we identify the biomarker relationship among the biomedical entities (i.e., semantic relationship classification). The system, Biomarker Information Extraction Tool (BIET) implements Machine Learning-based biomarker extraction using support vector machines (SVM). The system is trained and tested on a corpus of oncology related PubMed/M EDLINE literatures hand-annotated with biomarker information. We investigate the effectiveness of different features for this task and examine the amount of training data needed to learn the biomarker relationship with the entities. Our system achieved an average Fscore of 86% for the task of biomarker information extraction comparing to the human annotated dataset (i.e. gold standard) scores.
UR - http://www.scopus.com/inward/record.url?scp=77956156444&partnerID=8YFLogxK
U2 - 10.1109/ICBBE.2010.5514717
DO - 10.1109/ICBBE.2010.5514717
M3 - Conference proceeding contribution
AN - SCOPUS:77956156444
SN - 9781424447138
SP - 1
EP - 4
BT - 2010 4th International Conference on Bioinformatics and Biomedical Engineering, iCBBE 2010
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
T2 - 4th International Conference on Bioinformatics and Biomedical Engineering, iCBBE 2010
Y2 - 18 June 2010 through 20 June 2010
ER -