Learning Rate Annealing Can Provably Help Generalization