TY - GEN
T1 - Holistic approach for studying resource failures at scale
AU - Lee, Young Choon
AU - King, Jayden
AU - Hong, Seok-Hee
PY - 2019
Y1 - 2019
N2 - In large-scale distributed systems, such as data centers resource failures are the norm rather than an exception. In this paper, we propose a holistic approach to study resource failures from resource failure modelling to distributed system simulation to failure-aware scheduling algorithm design. In particular, we present (1) a simple and yet practical way to model resource failures using real-world failure traces, (2) a new distributed systems simulator and (3) two failure-aware scheduling algorithms. These scheduling algorithms are designed primarily to validate (1) and (2). Our evaluation results demonstrate the feasibility and effectiveness of our holistic approach.
AB - In large-scale distributed systems, such as data centers resource failures are the norm rather than an exception. In this paper, we propose a holistic approach to study resource failures from resource failure modelling to distributed system simulation to failure-aware scheduling algorithm design. In particular, we present (1) a simple and yet practical way to model resource failures using real-world failure traces, (2) a new distributed systems simulator and (3) two failure-aware scheduling algorithms. These scheduling algorithms are designed primarily to validate (1) and (2). Our evaluation results demonstrate the feasibility and effectiveness of our holistic approach.
UR - http://www.scopus.com/inward/record.url?scp=85077956235&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/arc/DP180102553
U2 - 10.1109/NCA.2019.8935032
DO - 10.1109/NCA.2019.8935032
M3 - Conference proceeding contribution
SN - 9781728125220
T3 - 2019 IEEE 18th International Symposium on Network Computing and Applications, NCA 2019
SP - 1
EP - 4
BT - 2019 IEEE 18th International Symposium on Network Computing and Applications, NCA 2019
A2 - Gkoulalas-Divanis, Aris
A2 - Marchetti, Mirco
A2 - Avresky, Dimiter R.
PB - Institute of Electrical and Electronics Engineers (IEEE)
T2 - 18th IEEE International Symposium on Network Computing and Applications, NCA 2019
Y2 - 26 September 2019 through 28 September 2019
ER -