Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions

ACL 2020