Holistic approach for studying resource failures at scale

Young Choon Lee, Jayden King, Seok-Hee Hong

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

1 Citation (Scopus)

Abstract

In large-scale distributed systems, such as data centers resource failures are the norm rather than an exception. In this paper, we propose a holistic approach to study resource failures from resource failure modelling to distributed system simulation to failure-aware scheduling algorithm design. In particular, we present (1) a simple and yet practical way to model resource failures using real-world failure traces, (2) a new distributed systems simulator and (3) two failure-aware scheduling algorithms. These scheduling algorithms are designed primarily to validate (1) and (2). Our evaluation results demonstrate the feasibility and effectiveness of our holistic approach.

Original languageEnglish
Title of host publication2019 IEEE 18th International Symposium on Network Computing and Applications, NCA 2019
EditorsAris Gkoulalas-Divanis, Mirco Marchetti, Dimiter R. Avresky
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1-4
Number of pages4
ISBN (Electronic)9781728125213
ISBN (Print)9781728125220
DOIs
Publication statusPublished - 2019
Event18th IEEE International Symposium on Network Computing and Applications, NCA 2019 - Cambridge, United States
Duration: 26 Sep 201928 Sep 2019

Publication series

Name2019 IEEE 18th International Symposium on Network Computing and Applications, NCA 2019

Conference

Conference18th IEEE International Symposium on Network Computing and Applications, NCA 2019
CountryUnited States
CityCambridge
Period26/09/1928/09/19

Fingerprint Dive into the research topics of 'Holistic approach for studying resource failures at scale'. Together they form a unique fingerprint.

Cite this