TY - JOUR
T1 - A workflow for mutation extraction and structure annotation
AU - Kanagasabai, Rajaraman
AU - Choo, Khar Heng
AU - Ranganathan, Shoba
AU - Baker, Christopher J O
PY - 2007/12
Y1 - 2007/12
N2 - Rich information on point mutation studies is scattered across heterogeneous data sources. This paper presents an automated workflow for mining mutation annotations from full-text biomedical literature using natural language processing (NLP) techniques as well as for their subsequent reuse in protein structure annotation and visualization. This system, called mSTRAP (Mutation extraction and STRucture Annotation Pipeline), is designed for both information aggregation and subsequent brokerage of the mutation annotations. It facilitates the coordination of semantically related information from a series of text mining and sequence analysis steps into a formal OWL-DL ontology. The ontology is designed to support application-specific data management of sequence, structure, and literature annotations that are populated as instances of object and data type properties. mSTRAPviz is a subsystem that facilitates the brokerage of structure information and the associated mutations for visualization. For mutated sequences without any corresponding structure available in the Protein Data Bank (PDB), an automated pipeline for homology modeling is developed to generate the theoretical model. With mSTRAP, we demonstrate a workable system that can facilitate automation of the workflow for the retrieval, extraction, processing, and visualization of mutation annotations - tasks which are well known to be tedious, time-consuming, complex, and error-prone. The ontology and visualization tool are available at http://datam.i2r.a-star.edu.sg/mstrap.
AB - Rich information on point mutation studies is scattered across heterogeneous data sources. This paper presents an automated workflow for mining mutation annotations from full-text biomedical literature using natural language processing (NLP) techniques as well as for their subsequent reuse in protein structure annotation and visualization. This system, called mSTRAP (Mutation extraction and STRucture Annotation Pipeline), is designed for both information aggregation and subsequent brokerage of the mutation annotations. It facilitates the coordination of semantically related information from a series of text mining and sequence analysis steps into a formal OWL-DL ontology. The ontology is designed to support application-specific data management of sequence, structure, and literature annotations that are populated as instances of object and data type properties. mSTRAPviz is a subsystem that facilitates the brokerage of structure information and the associated mutations for visualization. For mutated sequences without any corresponding structure available in the Protein Data Bank (PDB), an automated pipeline for homology modeling is developed to generate the theoretical model. With mSTRAP, we demonstrate a workable system that can facilitate automation of the workflow for the retrieval, extraction, processing, and visualization of mutation annotations - tasks which are well known to be tedious, time-consuming, complex, and error-prone. The ontology and visualization tool are available at http://datam.i2r.a-star.edu.sg/mstrap.
UR - http://www.scopus.com/inward/record.url?scp=37849020226&partnerID=8YFLogxK
U2 - 10.1142/S0219720007003119
DO - 10.1142/S0219720007003119
M3 - Article
C2 - 18172931
AN - SCOPUS:37849020226
VL - 5
SP - 1319
EP - 1337
JO - Journal of Bioinformatics and Computational Biology
JF - Journal of Bioinformatics and Computational Biology
SN - 0219-7200
IS - 6
ER -