Transformers in Computer Vision - Crossminds
CrossMind.ai logo
Transformers in Computer Vision
Transformer architecture is known as the state of the art results in many NLP tasks. In CV, CNNS have become the main models for many tasks since 2012. Using Transformers architecture for vision tasks became a new way to explore the topic, reducing complexity at the same tile training efficiency and practicing scalability.
Top related arXiv papers
Key knowledge areas
Sort by
Other recommended papers

Dirk Weissenborn, Scaling Autoregressive Video Models. ICLR 2020

H. Hu, Relation Networks for Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018

Yuxin Wu, Group Normalization. ECCV 2018

Zilong Huang, CCNet: Criss-Cross Attention for Semantic Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019

Irwan Bello, Attention Augmented Convolutional Networks. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019

C. Sun, VideoBERT: A Joint Model for Video and Language Representation Learning. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019

Prajit Ramachandran, Stand-Alone Self-Attention in Vision Models. NeurIPS 2019

Niki Parmar, Image Transformer. ICML 2018