Roles and Utilization of Attention Heads in Transformer-based Neural Language Models

ACL 2020