Policy Optimization in Reinforcement Learning: RL as black-box optimization