An Operator View of Policy Gradient