Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS

Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS

May 26, 2022
|
44 views
Details
Have you ever wondered where the problems with multimodal integrations of vision and language are? This is the first part of Ms. Coffee Bean’s quest to uncovering what’s going wrong with multimodal vision and language integration. ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕ Patreon: https://www.patreon.com/AICoffeeBreak Ko-fi: https://ko-fi.com/aicoffeebreak ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 📺 Ms. Coffee Bean explains PROBING: https://youtu.be/fL22NAtMNYo 📺 Ms. Coffee Bean defines MULTIMODALITY: https://youtu.be/jReaoJWdO78 Outline: * 00:00 Visual Question Answering * 01:04 Visual Dialog Demo * 02:30 The symptom * 04:06 Multimodal stress test 1 * 06:35 Multimodal stress test 2 Papers: 📄 Patro, Badri, and Vinay P. Namboodiri. "Differential attention for visual question answering." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7680-7688. 2018. https://openaccess.thecvf.com/content_cvpr_2018/papers/Patro_Differential_Attention_for_CVPR_2018_paper.pdf 📄 Shekhar, Ravi, Ece Takmaz, Raquel Fernández, and Raffaella Bernardi. "Evaluating the Representational Hub of Language and Vision Models." IWCS 2019: 211. https://www.aclweb.org/anthology/W19-0418.pdf 📄 Caglayan, O., Madhyastha, P., Specia, L., & Barrault, L. (2019, June). Probing the Need for Visual Context in Multimodal Machine Translation. In Proceedings of the 2019 Conference of the North (pp. 4159-4170). Association for Computational Linguistics. https://arxiv.org/pdf/1903.08678.pdf Intro music: Discovery Hit by Kevin MacLeod is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) Source: http://incompetech.com/music/royalty-free/index.html?isrc=USUAN1300023 Artist: http://incompetech.com/ Video contains emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0 ----------------------------------------------------- 🔗 Links: YouTube: https://www.youtube.com/AICoffeeBreak Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ #AICoffeeBreak #MsCoffeeBean #multimodality #multimodal #MachineLearning #AI #research #ComputerVision #NLP

00:00 Visual Question Answering 01:04 Visual Dialog Demo 02:30 The symptom 04:06 Multimodal stress test 1 06:35 Multimodal stress test 2
Comments
loading...