Dynamic workload balancing for hadoop MapReduce

Xiaofei Hou, Ashwin T K Kumar, Johnson P. Thomas, Vijay Varadharajan

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

17 Citations (Scopus)

Abstract

Hadoop has two components which are HDFS and MapReduce. HDFS is a distributed file system for storing data for users of Hadoop and MapReduce is the framework that executes jobs from users. Hadoop stores user data based on space utilization of data nodes on the cluster rather than the processing capability of the data nodes. Furthermore Hadoop runs in a heterogeneous environment as all data nodes may not be homogeneous. For these reasons, workload imbalances will occur when Hadoop runs resulting in poor performance. In this paper, we propose a dynamic algorithm to balance the workload between different racks on a Hadoop cluster based on information obtained from analyzing the log files of Hadoop. Moving tasks from the busiest rack to another rack improves the performance of Hadoop MapReduce by reducing the running time of jobs. Our simulations indicate that using our algorithm, we can decrease by more than 50% the remaining time of the tasks belonged to a job running on the busiest rack.

Original languageEnglish
Title of host publicationProceedings - 4th IEEE International Conference on Big Data and Cloud Computing, BDCloud 2014 with the 7th IEEE International Conference on Social Computing and Networking, SocialCom 2014 and the 4th International Conference on Sustainable Computing and Communications, SustainCom 2014
EditorsJinjun Chen, Laurence T. Yang
Place of PublicationPicataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages56-62
Number of pages7
ISBN (Electronic)9781479967193
ISBN (Print)9781479967209
DOIs
Publication statusPublished - 2014
Event4th IEEE International Conference on Big Data and Cloud Computing, BDCloud 2014 - Sydney, Australia
Duration: 3 Dec 20145 Dec 2014

Other

Other4th IEEE International Conference on Big Data and Cloud Computing, BDCloud 2014
Country/TerritoryAustralia
CitySydney
Period3/12/145/12/14

Keywords

  • Hadoop
  • MapReduce
  • Dynamic Workload balancing
  • OpenFlow

Cite this