Executing large scale scientific workflow ensembles in public clouds

Qingye Jiang, Young Choon Lee, Albert Y. Zomaya

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

29 Citations (Scopus)

Abstract

Scientists in different fields, such as high energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., Workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this paper, we address two main challenges in executing large-scale workflow ensembles in public clouds with both cost and deadline constraints: (1) execution coordination, and (2) resource provisioning. To this end, we develop a new pulling based workflow execution system with a profiling-based resource provisioning strategy. The idea is homogeneity in both scientific workflows and cloud resources can be exploited to remove scheduling overhead (in execution coordination) and to minimize cost meeting deadline. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage (an astronomical image mosaic engine) workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline.

Original languageEnglish
Title of host publicationProceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015
Place of PublicationPicataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages520-529
Number of pages10
ISBN (Electronic)9781467375870
ISBN (Print)9781467375887
DOIs
Publication statusPublished - 2015
Event44th International Conference on Parallel Processing, ICPP 2015 - Beijing, China
Duration: 1 Sept 20154 Sept 2015

Other

Other44th International Conference on Parallel Processing, ICPP 2015
Country/TerritoryChina
CityBeijing
Period1/09/154/09/15

Fingerprint

Dive into the research topics of 'Executing large scale scientific workflow ensembles in public clouds'. Together they form a unique fingerprint.

Cite this