Microbase2.0: a generic framework for computationally intensive bioinformatics workflows in the cloud.

Keith Flanagan, Sirintra Nakjang, Jennifer Hallinan, Colin Harwood, Robert P. Hirt, Matthew R. Pocock, Anil Wipat

Research output: Contribution to journalArticleResearchpeer-review

Abstract

As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.

LanguageEnglish
Pages212
Number of pages1
JournalJournal of integrative bioinformatics
Volume9
Issue number2
Publication statusPublished - 2012

Fingerprint

Workflow
Computational Biology
Technology
Archaea
Eukaryota
Bacteria
Datasets
Proteins

Cite this

Flanagan, K., Nakjang, S., Hallinan, J., Harwood, C., Hirt, R. P., Pocock, M. R., & Wipat, A. (2012). Microbase2.0: a generic framework for computationally intensive bioinformatics workflows in the cloud. Journal of integrative bioinformatics, 9(2), 212.
Flanagan, Keith ; Nakjang, Sirintra ; Hallinan, Jennifer ; Harwood, Colin ; Hirt, Robert P. ; Pocock, Matthew R. ; Wipat, Anil. / Microbase2.0 : a generic framework for computationally intensive bioinformatics workflows in the cloud. In: Journal of integrative bioinformatics. 2012 ; Vol. 9, No. 2. pp. 212.
@article{006e593084404f22b500dfe12d5db8a5,
title = "Microbase2.0: a generic framework for computationally intensive bioinformatics workflows in the cloud.",
abstract = "As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.",
author = "Keith Flanagan and Sirintra Nakjang and Jennifer Hallinan and Colin Harwood and Hirt, {Robert P.} and Pocock, {Matthew R.} and Anil Wipat",
year = "2012",
language = "English",
volume = "9",
pages = "212",
journal = "Journal of integrative bioinformatics",
issn = "1613-4516",
publisher = "Informationsmanagement in der Biotechnologie e.V. (IMBio e.V.)",
number = "2",

}

Flanagan, K, Nakjang, S, Hallinan, J, Harwood, C, Hirt, RP, Pocock, MR & Wipat, A 2012, 'Microbase2.0: a generic framework for computationally intensive bioinformatics workflows in the cloud.', Journal of integrative bioinformatics, vol. 9, no. 2, pp. 212.

Microbase2.0 : a generic framework for computationally intensive bioinformatics workflows in the cloud. / Flanagan, Keith; Nakjang, Sirintra; Hallinan, Jennifer; Harwood, Colin; Hirt, Robert P.; Pocock, Matthew R.; Wipat, Anil.

In: Journal of integrative bioinformatics, Vol. 9, No. 2, 2012, p. 212.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Microbase2.0

T2 - Journal of integrative bioinformatics

AU - Flanagan, Keith

AU - Nakjang, Sirintra

AU - Hallinan, Jennifer

AU - Harwood, Colin

AU - Hirt, Robert P.

AU - Pocock, Matthew R.

AU - Wipat, Anil

PY - 2012

Y1 - 2012

N2 - As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.

AB - As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.

UR - http://www.scopus.com/inward/record.url?scp=84872100355&partnerID=8YFLogxK

M3 - Article

VL - 9

SP - 212

JO - Journal of integrative bioinformatics

JF - Journal of integrative bioinformatics

SN - 1613-4516

IS - 2

ER -