Asymptotically optimal multi-Armed bandit policies under a cost constraint

Apostolos Burnetas, Odysseas Kanavetas, Michael N. Katehakis

Research output: Contribution to journalArticle

3 Scopus citations

Abstract

We consider the multi-Armed bandit problem under a cost constraint. Successive samples from each population are i.i.d. with unknown distribution and each sample incurs a known population-dependent cost. The objective is to design an adaptive sampling policy to maximize the expected sum of n samples such that the average cost does not exceed a given bound sample-path wise. We establish an asymptotic lower bound for the regret of feasible uniformly fast convergent policies, and construct a class of policies, which achieve the bound. We also provide their explicit form under Normal distributions with unknown means and known variances.

Original languageEnglish (US)
Pages (from-to)284-310
Number of pages27
JournalProbability in the Engineering and Informational Sciences
Volume31
Issue number3
DOIs
StatePublished - Jul 1 2017

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Management Science and Operations Research
  • Industrial and Manufacturing Engineering

Keywords

  • applied probability
  • stochastic modelling

Fingerprint Dive into the research topics of 'Asymptotically optimal multi-Armed bandit policies under a cost constraint'. Together they form a unique fingerprint.

Cite this