Abstract
We express Gittins indices for multi-armed bandit problems as Laurent expansions around discount factor 1. The coefficients of these expansions are then used to characterize stationary optimal policies when the optimality criteria are sensitive-discount optimality (otherwise known as Blackwell optimality), average-reward optimality and average-overtaking optimality. We also obtain bounds and derive optimality conditions for policies of a type that continue playing the same bandit as long as the state of that bandit remains in prescribed sets.
Original language | English (US) |
---|---|
Pages (from-to) | 1024-1034 |
Number of pages | 11 |
Journal | Annals of Applied Probability |
Volume | 6 |
Issue number | 3 |
DOIs | |
State | Published - Aug 1996 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- Bandit problems
- Gittins index
- Laurent expansions
- Markov decision chains
- Optimality criteria