MCTS as Regularized Policy Optimization