Open bandit processes with uncountable states and time-Ackward effects

X. Wu, X. Zhou

Research output: Contribution to journalArticlepeer-review

Abstract

Bandit processes and the Gittins index have provided powerful and elegant theory and tools for the optimization of allocating limited resources to competitive demands. In this paper we extend the Gittins theory to more general branching bandit processes, also referred to as open bandit processes, that allow uncountable states and backward times. We establish the optimality of the Gittins index policy with uncountably many states, which is useful in such problems as dynamic scheduling with continuous random processing times. We also allow negative time durations for discounting a reward to account for the present value of the reward that was received before the present time, which we refer to as time-backward effects. This could model the situation of offering bonus rewards for completing jobs above expectation. Moreover, we discover that a common belief on the optimality of the Gittins index in the generalized bandit problem is not always true without additional conditions, and provide a counterexample. We further apply our theory of open bandit processes with time-backward effects to prove the optimality of the Gittins index in the generalized bandit problem under a sufficient condition.

Original languageEnglish
Pages (from-to)388-402
Number of pages15
JournalJournal of Applied Probability
Volume50
Issue number2
DOIs
Publication statusPublished - Jun 2013

Fingerprint Dive into the research topics of 'Open bandit processes with uncountable states and time-Ackward effects'. Together they form a unique fingerprint.

Cite this