Time-average Markov decision problems are considered for the finite state and action
spaces. Several definitions of variability are introduced and compared. For multichain
case, it is shown that a stationary policy maximizes one of the criteria, namely, the
expected long-run average variability. An algorithm which uses a decomposition
approach to locate such an optimal policy is given. The algorithm produces an
optimal pure policy under convexity conditions for the variability function. The
unichain semi-Markov decision processes are examined. It is shown that a stationary
policy maximizes the expected average reward subject to the condition that the longrun
average cost is below certain level with probability 1. A fractional program is
presented which produces such an optimal stationary policy. Two-person zero-sum
stochastic games are also considered. In the case that only one player controls the
transition probabilities, stationary policies are shown to exist which give the saddlepoint
solution for multichained expected long-run average reward. An algorithm
using the decomposition theory is developed to find optimal stationary policies for
both players. In the case that both players control the transition probabilities a
generalized game is obtained. The solution of this game gives optimal stationary
policies for the players if the game is irreducible. |