Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

ACL 2020