Abstract
Considered are semi-Markov decision processes (SMDPs) with finite state and action spaces. We study two criteria: the expected average reward per unit time subject to a sample path constraint on the average cost per unit time and the expected time-average variability. Under a certain condition, for communicating SMDPs, we construct (randomized) stationary policies that are ε-optimal for each criterion; the policy is optimal for the first criterion under the unichain assumption and the policy is optimal and pure for a specific variability function in the second criterion. For general multichain SMDPs, by using a state space decomposition approach, similar results are obtained.
Original language | English (US) |
---|---|
Pages (from-to) | 635-657 |
Number of pages | 23 |
Journal | Probability in the Engineering and Informational Sciences |
Volume | 21 |
Issue number | 4 |
DOIs | |
State | Published - Oct 2007 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Management Science and Operations Research
- Industrial and Manufacturing Engineering