The Transformer neural network architecture EXPLAINED. “Attention is all you need” (NLP)

The Transformer neural network architecture EXPLAINED. “Attention is all you need” (NLP)

Jun 05, 2021
|
78 views
Details
⚙️ It is time to explain how Transformers work. If you are looking for a simple explanation, you found the right video! 📄 Paper: Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017. https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf 🔗 Table of contents with links: * 00:00 The Transformer * 00:14 Check out the implementations of varoiuos Transformer-based architectures from huggingface! https://github.com/huggingface/transformers * 00:38 RNNs recap * 01:14 Transformers high-level * 01:56 Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." arXiv preprint arXiv:1905.05950 (2019). https://arxiv.org/pdf/1905.05950.pdf * 02:27 The Transformer encoder * 03:39 Self-attention compared to attention * 04:51 Parallelisation * 05:37 Encoding word order * 06:13 Residual connections * 06:35 Generating the output sequence * 07:59 Masked word prediction * 08:40 Self-supervised learning FTW! * 09:08 Pre-training and fine-tuning and Probing * 09:44 End dance ;) Hungry for more? 📚 Check out the blog of @Jay Alammar: http://jalammar.github.io/illustrated-transformer/! It has helped me a lot in understanding the Transformers better and served as an inspiration for this video! 📺 @Yannic Kilcher paper explanation: https://youtu.be/iDulhoQ2pro ------------------------------------------ 🔗 Links: YouTube: https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/ Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ #AICoffeeBreak #MsCoffeeBean #TransformerinML #MachineLearning #AI #research

00:00 The Transformer 00:14 Check out the implementations of various Transformer-based architectures from huggingface! https://github.com/huggingface/transformers 00:38 RNNs recap 01:14 Transformers high-level 01:56 Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." https://arxiv.org/pdf/1905.05950.pdf 02:27 The Transformer encoder 03:39 Self-attention compared to attention 04:51 Parallelisation 05:37 Encoding word order 06:13 Residual connections 06:35 Generating the output sequence 07:59 Masked word prediction 08:40 Self-supervised learning FTW! 09:08 Pre-training and fine-tuning and Probing 09:44 End dance ;)
Comments
loading...