ICCV19: Oral Session 3.1B - Vision, Language, & Text

ICCV 2019

Link to indexed video: https://conftube.com/video/oFDF1yT0T-4 1. VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang https://conftube.com/video/oFDF1yT0T-4?tocitem=2 2. A Graph-Based Framework to Bridge Movies and Synopses Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin https://conftube.com/video/oFDF1yT0T-4?tocitem=9 3. From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason Ajeet Kumar Singh, Anand Mishra, Shashank Shekhar, Anirban Chakraborty https://conftube.com/video/oFDF1yT0T-4?tocitem=13 4. Counterfactual Critic Multi-Agent Training for Scene Graph Generation Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, Shiliang Pu, Shih-Fu Chang https://conftube.com/video/oFDF1yT0T-4?tocitem=20 5. Robust Change Captioning Dong Huk Park, Trevor Darrell, Anna Rohrbach https://conftube.com/video/oFDF1yT0T-4?tocitem=29 6. Attention on Attention for Image Captioning Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei https://conftube.com/video/oFDF1yT0T-4?tocitem=41 7. Dynamic Graph Attention for Referring Expression Comprehension Sibei Yang, Guanbin Li, Yizhou Yu https://conftube.com/video/oFDF1yT0T-4?tocitem=48 8. Visual Semantic Reasoning for Image-Text Matching Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu https://conftube.com/video/oFDF1yT0T-4?tocitem=54 9. Phrase Localization Without Paired Training Examples Josiah Wang, Lucia Specia https://conftube.com/video/oFDF1yT0T-4?tocitem=61 10. Learning to Assemble Neural Module Tree Networks for Visual Grounding Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha https://conftube.com/video/oFDF1yT0T-4?tocitem=70 11. A Fast and Accurate One-Stage Approach to Visual Grounding Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo https://conftube.com/video/oFDF1yT0T-4?tocitem=78 12. Zero-Shot Grounding of Objects From Natural Language Queries Arka Sadhu, Kan Chen, Ram Nevatia https://conftube.com/video/oFDF1yT0T-4?tocitem=86 13. Towards Unconstrained End-to-End Text Spotting Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao https://conftube.com/video/oFDF1yT0T-4?tocitem=99 14. What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee https://conftube.com/video/oFDF1yT0T-4?tocitem=108