Abstract
The multiarmed-bandit problem is often taken as a basic model for the trade-off between the exploration and utilization required for efficient optimization under uncertainty. In this article, we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with distributions from the one-parameter exponential family. When the objective is to maximize the Bayes expected sum of outcomes over a finite horizon, it is shown that optimal policies tend to simple limits when the length of the horizon is large.
Original language | English (US) |
---|---|
Pages (from-to) | 53-82 |
Number of pages | 30 |
Journal | Probability in the Engineering and Informational Sciences |
Volume | 17 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1 2003 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Management Science and Operations Research
- Industrial and Manufacturing Engineering