Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Jun 30, 2021
|
62 views
Details
Ms. Coffee Bean explains the importance of flexible tokenization and then moves onto explaining the “Charformer: Fast Character Transformers via Gradient-based Subword Tokenization” paper. Paper 📄: Tay, Yi, Vinh Tran, Sebastian Ruder, Jai Gupta, Hyung Won Chung, Dara Bahri, Zhen Qin, Simon Baumgartner, Cong Yu and Donald Metzler. “Charformer: Fast Character Transformers via Gradient-based Subword Tokenization.” (2021). https://arxiv.org/abs/2106.12672 📺 Replacing self-attention with the Fourier Transform: https://youtu.be/j7pWPdGEfMA 📺 Convolutions instead of self-attention. When is a Transformer not a Transformer anymore? : https://youtu.be/xchDU2VMR4M 📺 Transformer explained: https://youtu.be/FWFA4DGuzSc Outline: 00:00 What are tokenizers good for? 02:49 Where does rigid tokenization fail? 03:51 Charformer: end-to-end tokenization 08:33 Again, but in summary. 09:57 Reducing the sequence length 10:37 Meta-comments on token mixing ---------------------------------- 🔗 Links: YouTube: https://www.youtube.com/AICoffeeBreak Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​

00:00 What are tokenizers good for? 02:49 Where does rigid tokenization fail? 03:51 Charformer: end-to-end tokenization 08:33 Again, but in summary. 09:57 Reducing the sequence length 10:37 Meta-comments on token mixing
Comments
loading...