Understanding Self-attention of Self-supervised Audio Transformers