How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]

How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]

Apr 06, 2022
|
24 views
Details
What could go wrong when presenting a paper without Ms. Coffee Bean? And this time, it is not ANY paper 😱: 📄 Paper: Parcalabescu, L., Gatt, A., Frank, A., and Calixto, I. (2021): Seeing Past Words: Testing the Cross-Modal Capabilities of Pretrained V&L Models. Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR), 2021, Groningen, Netherlands (Online), Association for Computational Linguistics, p. 1--10 🔗 https://arxiv.org/abs/2012.12352 Data 👉 https://github.com/Heidelberg-NLP/counting-probe Other relevant videos: 📺 Replacing self-attention with the Fourier Transform: https://youtu.be/j7pWPdGEfMA 📺 Explaining ViLBERT: https://youtu.be/dd7nE4nbxN0 📺 Transformer explained: https://youtu.be/FWFA4DGuzSc Outline: 00:00 While she is sleeping 00:52 “Multimodal faking” 01:33 Vision and Language models 02:40 How cross-modal are they really? 03:20 Counting probe 05:35 Interpolation 06:52 Fighting Ms. Coffee Bean Music 🎵 : Pretty Boy - DJ Freedem ----------------------------------------------------------------------- 🔗 Links: YouTube: https://www.youtube.com/AICoffeeBreak Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​

00:00 While she is sleeping 00:52 “Multimodal faking” 01:33 Vision and Language models 02:40 How cross-modal are they really? 03:20 Counting probe 05:35 Interpolation 06:52 Fighting Ms. Coffee Bean
Comments
loading...