Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

ICML 2020