An Operator View of Policy Gradients