Provably Efficient Policy Optimization with Thompson Sampling

NeurIPS 2020