Conservative Exploration in Bandits and Reinforcement Learning