Average Reward Reinforcement Learning with Monotonic Policy Improvement