"Spatially Aware Multimodal Transformers for TextVQA" is work by Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal at Georgia Tech, Facebook AI Research (FAIR), and University of Illinois, Urbana-Champaign.
This work has been accepted to the European Conference on Computer Vision (ECCV) 2020.
Full paper: https://arxiv.org/pdf/2007.12146.pdf
Website: www.ml.gatech.edu
Twitter: @mlatgt
Instagram: @mlatgeorgiatech