Roles and Utilization of Attention Heads in Transformer-based Neural Language Models