TY - GEN
T1 - Diversity vs. parallelism in distributed computing with redundancy
AU - Peng, Pei
AU - Soljanin, Emina
AU - Whiting, Philip
PY - 2020
Y1 - 2020
N2 - Distributed computing enables parallel execution of tasks that make up a large computing job. Random fluctuations in service times (inherent to computing environments) often cause a non-negligible number of straggling tasks with long completion time. Redundancy, in the form of task replication and erasure coding, has emerged as a potentially powerful way to curtail the variability in service time, as it provides diversity that allows a job to be completed when only a subset of redundant tasks gets executed. Thus both redundancy and parallelism reduce the execution time, but compete for resources of the system. In situations of constrained resources (here fixed number of parallel servers), increasing redundancy reduces the available level of parallelism. We characterize the diversity vs. parallelism tradeoff for three common models of task size dependent execution times. We find that different models operate optimally at different levels of redundancy, and thus may require very different code rates.
AB - Distributed computing enables parallel execution of tasks that make up a large computing job. Random fluctuations in service times (inherent to computing environments) often cause a non-negligible number of straggling tasks with long completion time. Redundancy, in the form of task replication and erasure coding, has emerged as a potentially powerful way to curtail the variability in service time, as it provides diversity that allows a job to be completed when only a subset of redundant tasks gets executed. Thus both redundancy and parallelism reduce the execution time, but compete for resources of the system. In situations of constrained resources (here fixed number of parallel servers), increasing redundancy reduces the available level of parallelism. We characterize the diversity vs. parallelism tradeoff for three common models of task size dependent execution times. We find that different models operate optimally at different levels of redundancy, and thus may require very different code rates.
UR - http://www.scopus.com/inward/record.url?scp=85090404361&partnerID=8YFLogxK
U2 - 10.1109/ISIT44484.2020.9174030
DO - 10.1109/ISIT44484.2020.9174030
M3 - Conference proceeding contribution
AN - SCOPUS:85090404361
SN - 9781728164335
SP - 257
EP - 262
BT - 2020 IEEE International Symposium on Information Theory
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
T2 - 2020 IEEE International Symposium on Information Theory, ISIT 2020
Y2 - 21 July 2020 through 26 July 2020
ER -