Is normalization indispensable for training deep neural networks?

NeurIPS 2020