πSimVLM explained. What the authors tell us, what they donβt tell us and how this all works. Enjoy with coffee!
πΊ Vision & Language Transformer explained (ViLBERT): https://youtu.be/dd7nE4nbxN0
πΊ ViT explained: https://youtu.be/DVoHvmww2lQ
Thanks to our Patrons who support us in Tier 2, 3, 4: π
donor, Dres. Trost GbR, Yannik Schneider
Paper:
π Wang, Zirui, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, and Yuan Cao. "SimVLM: Simple Visual Language Model Pretraining with Weak Supervision." arXiv preprint arXiv:2108.10904 (2021). https://arxiv.org/abs/2108.10904
π SimVLM AI Google Blog post: https://ai.googleblog.com/2021/10/simvlm-simple-visual-language-model-pre.html
π Jia, Chao, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, and Tom Duerig. "Scaling up visual and vision-language representation learning with noisy text supervision." arXiv preprint arXiv:2102.05918 (2021). https://arxiv.org/abs/2102.05918
πGPT-3 paper: Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan et al. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 (2020). https://arxiv.org/abs/2005.14165
πΊ GPT-3 video: https://youtu.be/5fqxPOaaqi0
Outline:
00:00 SimVLM
01:15 End-to-end image processing
03:01 Objective: Prefix Language Modelling
06:38 The secret ingredient
ββββββββββββββββββββββββββ
π₯ Optionally, pay us a coffee to help with our Coffee Bean production! β
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
ββββββββββββββββββββββββββ
π Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #researchβ
Video and thumbnail contain emojis designed by OpenMoji β the open-source emoji and icon project. License: CC BY-SA 4.0
00:00 SimVLM
01:15 End-to-end image processing
03:01 Objective: Prefix Language Modelling
06:38 The secret ingredient