Few-shot learning for “normal-sized” language models like BERT or ALBERT with pattern-exploiting training (PET) explained. Here you can find PET, iPET, ADAPET. Choose your favorite! Not only GPT-3 is a few-shot learner, at least not on SuperGLUE.
Schick, T., & Schütze, H. (2020). Exploiting cloze questions for few-shot text classification and natural language inference. arXiv preprint arXiv:2001.07676. https://arxiv.org/abs/2001.07676
Schick, T., & Schütze, H. (2020). It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. arXiv preprint arXiv:2009.07118. https://arxiv.org/abs/2009.07118
Tam, D., Menon, R. R., Bansal, M., Srivastava, S., & Raffel, C. (2021). Improving and Simplifying Pattern Exploiting Training. arXiv preprint arXiv:2103.11955. https://arxiv.org/abs/2103.11955
00:00 Small language models are also few-shot learners
01:30 Few-shot learning for GPT-3
02:58 Few-shot learning for everyone: PET
08:00 The gist of PET