Clue: Cross-modal Coherence Modeling for Caption Generation

ACL 2020