Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning