TY - JOUR
T1 - Microbase2.0
T2 - a generic framework for computationally intensive bioinformatics workflows in the cloud.
AU - Flanagan, Keith
AU - Nakjang, Sirintra
AU - Hallinan, Jennifer
AU - Harwood, Colin
AU - Hirt, Robert P.
AU - Pocock, Matthew R.
AU - Wipat, Anil
PY - 2012
Y1 - 2012
N2 - As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.
AB - As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.
UR - http://www.scopus.com/inward/record.url?scp=84872100355&partnerID=8YFLogxK
M3 - Article
C2 - 23001322
AN - SCOPUS:84872100355
VL - 9
SP - 212
JO - Journal of Integrative Bioinformatics
JF - Journal of Integrative Bioinformatics
IS - 2
ER -