⚙️ It is time to explain how Transformers work. If you are looking for a simple explanation, you found the right video!
📄 Paper: Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017. https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
🔗 Table of contents with links:
* 00:00 The Transformer
* 00:14 Check out the implementations of varoiuos Transformer-based architectures from huggingface! https://github.com/huggingface/transformers
* 00:38 RNNs recap
* 01:14 Transformers high-level
* 01:56 Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." arXiv preprint arXiv:1905.05950 (2019). https://arxiv.org/pdf/1905.05950.pdf
* 02:27 The Transformer encoder
* 03:39 Self-attention compared to attention
* 04:51 Parallelisation
* 05:37 Encoding word order
* 06:13 Residual connections
* 06:35 Generating the output sequence
* 07:59 Masked word prediction
* 08:40 Self-supervised learning FTW!
* 09:08 Pre-training and fine-tuning and Probing
* 09:44 End dance ;)
Hungry for more?
📚 Check out the blog of @Jay Alammar: http://jalammar.github.io/illustrated-transformer/! It has helped me a lot in understanding the Transformers better and served as an inspiration for this video!
📺 @Yannic Kilcher paper explanation: https://youtu.be/iDulhoQ2pro
------------------------------------------
🔗 Links:
YouTube: https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #TransformerinML #MachineLearning #AI #research
00:00 The Transformer
00:14 Check out the implementations of various Transformer-based architectures from huggingface! https://github.com/huggingface/transformers
00:38 RNNs recap
01:14 Transformers high-level
01:56 Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." https://arxiv.org/pdf/1905.05950.pdf
02:27 The Transformer encoder
03:39 Self-attention compared to attention
04:51 Parallelisation
05:37 Encoding word order
06:13 Residual connections
06:35 Generating the output sequence
07:59 Masked word prediction
08:40 Self-supervised learning FTW!
09:08 Pre-training and fine-tuning and Probing
09:44 End dance ;)