Abstract: Benefiting from the capability of building inter-dependencies among channels or spatial locations, attention mechanisms have been extensively studied and broadly used in a variety of computer vision tasks recently. This paper investigates light-weight but effective attention mechanisms and presents triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure. For an input tensor, triplet attention builds inter-dimensional dependencies by the rotation operation followed by residual transformations and encodes inter-channel and spatial information with negligible computational overhead. Our method is simple and efficient and can be easily plugged into classic backbone networks as an add-on module. We demonstrate our method's effectiveness on various challenging tasks, including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets. Furthermore, we provide extensive insight into the performance of triplet attention by visually inspecting the GradCAM and GradCAM++ results. Our method's empirical evaluation supports our intuition on the importance of capturing dependencies across dimensions when computing attention weights.