Abstract
The scale of scientific applications becomes increasingly large not only in computation, but also in data. Many of these applications also concern inter-related tasks with data dependencies, hence, they are scientific workflows. The efficient coordination of executing/running scientific workflows is of great practical importance. The core of such coordination is scheduling and resource allocation. In this paper, we present three scheduling heuristics for running large-scale, data-intensive scientific workflows in clouds. In particular, the three heuristic algorithms are designed to leverage slot queue threshold, data locality and data prefetching, respectively. We also demonstrate how these heuristics can be collectively used to tackle different issues in running 'data-intensive' workflows in clouds although each of these heuristics can be used independently. The practicality of our algorithms has been realized by actually implementing and incorporating them into our workflow execution system (DEWE). Using Montage, an astronomical image mosaic engine, as an example workflow, and Amazon EC2 as the cloud environment, we evaluate the performance of our heuristics in terms primarily of completion time (make span). We also scrutinize workflow execution showing different execution phases to identify their impact on performance. Our algorithms scale well and reduce make span by up to 27%.
Original language | English |
---|---|
Title of host publication | Proceedings - 15th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2014 |
Place of Publication | Picataway, NJ |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 180-185 |
Number of pages | 6 |
ISBN (Electronic) | 9781479983346 |
ISBN (Print) | 9781479983353 |
DOIs | |
Publication status | Published - Dec 2014 |
Externally published | Yes |
Event | 15th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2014 - Hong Kong, China Duration: 9 Dec 2014 → 11 Dec 2014 |
Other
Other | 15th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2014 |
---|---|
Country/Territory | China |
City | Hong Kong |
Period | 9/12/14 → 11/12/14 |
Keywords
- Cloud Computing
- Data-Intensive
- Scheduling
- Scientific Workflows