Running Data-Intensive Scientific Workflows in the Cloud

Chiaki Sato, Luke M. Leslie, Young Choon Lee, Albert Y. Zomaya, Rajiv Ranjan

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

The scale of scientific applications becomes increasingly large not only in computation, but also in data. Many of these applications also concern inter-related tasks with data dependencies, hence, they are scientific workflows. The efficient coordination of executing/running scientific workflows is of great practical importance. The core of such coordination is scheduling and resource allocation. In this paper, we present three scheduling heuristics for running large-scale, data-intensive scientific workflows in clouds. In particular, the three heuristic algorithms are designed to leverage slot queue threshold, data locality and data prefetching, respectively. We also demonstrate how these heuristics can be collectively used to tackle different issues in running 'data-intensive' workflows in clouds although each of these heuristics can be used independently. The practicality of our algorithms has been realized by actually implementing and incorporating them into our workflow execution system (DEWE). Using Montage, an astronomical image mosaic engine, as an example workflow, and Amazon EC2 as the cloud environment, we evaluate the performance of our heuristics in terms primarily of completion time (make span). We also scrutinize workflow execution showing different execution phases to identify their impact on performance. Our algorithms scale well and reduce make span by up to 27%.

Original languageEnglish
Title of host publicationProceedings - 15th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2014
Place of PublicationPicataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages180-185
Number of pages6
ISBN (Electronic)9781479983346
ISBN (Print)9781479983353
DOIs
Publication statusPublished - Dec 2014
Externally publishedYes
Event15th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2014 - Hong Kong, China
Duration: 9 Dec 201411 Dec 2014

Other

Other15th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2014
Country/TerritoryChina
CityHong Kong
Period9/12/1411/12/14

Keywords

  • Cloud Computing
  • Data-Intensive
  • Scheduling
  • Scientific Workflows

Fingerprint

Dive into the research topics of 'Running Data-Intensive Scientific Workflows in the Cloud'. Together they form a unique fingerprint.

Cite this