Upper Confidence Reinforcement Learning with Value Targeted Regression