Finite state multi-armed bandit problems: Sensitive-discount, average-reward and average-overtaking optimality

Michael N. Katehakis, Uriel G. Rothblum

Research output: Contribution to journalArticle

8 Scopus citations

Abstract

We express Gittins indices for multi-armed bandit problems as Laurent expansions around discount factor 1. The coefficients of these expansions are then used to characterize stationary optimal policies when the optimality criteria are sensitive-discount optimality (otherwise known as Blackwell optimality), average-reward optimality and average-overtaking optimality. We also obtain bounds and derive optimality conditions for policies of a type that continue playing the same bandit as long as the state of that bandit remains in prescribed sets.

Original languageEnglish (US)
Pages (from-to)1024-1034
Number of pages11
JournalAnnals of Applied Probability
Volume6
Issue number3
DOIs
StatePublished - Aug 1996

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Bandit problems
  • Gittins index
  • Laurent expansions
  • Markov decision chains
  • Optimality criteria

Fingerprint Dive into the research topics of 'Finite state multi-armed bandit problems: Sensitive-discount, average-reward and average-overtaking optimality'. Together they form a unique fingerprint.

  • Cite this