TY - JOUR
T1 - Open bandit processes with uncountable states and time-Ackward effects
AU - Wu, X.
AU - Zhou, X.
PY - 2013/6
Y1 - 2013/6
N2 - Bandit processes and the Gittins index have provided powerful and elegant theory and tools for the optimization of allocating limited resources to competitive demands. In this paper we extend the Gittins theory to more general branching bandit processes, also referred to as open bandit processes, that allow uncountable states and backward times. We establish the optimality of the Gittins index policy with uncountably many states, which is useful in such problems as dynamic scheduling with continuous random processing times. We also allow negative time durations for discounting a reward to account for the present value of the reward that was received before the present time, which we refer to as time-backward effects. This could model the situation of offering bonus rewards for completing jobs above expectation. Moreover, we discover that a common belief on the optimality of the Gittins index in the generalized bandit problem is not always true without additional conditions, and provide a counterexample. We further apply our theory of open bandit processes with time-backward effects to prove the optimality of the Gittins index in the generalized bandit problem under a sufficient condition.
AB - Bandit processes and the Gittins index have provided powerful and elegant theory and tools for the optimization of allocating limited resources to competitive demands. In this paper we extend the Gittins theory to more general branching bandit processes, also referred to as open bandit processes, that allow uncountable states and backward times. We establish the optimality of the Gittins index policy with uncountably many states, which is useful in such problems as dynamic scheduling with continuous random processing times. We also allow negative time durations for discounting a reward to account for the present value of the reward that was received before the present time, which we refer to as time-backward effects. This could model the situation of offering bonus rewards for completing jobs above expectation. Moreover, we discover that a common belief on the optimality of the Gittins index in the generalized bandit problem is not always true without additional conditions, and provide a counterexample. We further apply our theory of open bandit processes with time-backward effects to prove the optimality of the Gittins index in the generalized bandit problem under a sufficient condition.
UR - http://www.scopus.com/inward/record.url?scp=84879766621&partnerID=8YFLogxK
U2 - 10.1239/jap/1371648948
DO - 10.1239/jap/1371648948
M3 - Article
AN - SCOPUS:84879766621
SN - 0021-9002
VL - 50
SP - 388
EP - 402
JO - Journal of Applied Probability
JF - Journal of Applied Probability
IS - 2
ER -