#dataaugmentation #nlp #transformers #gpt2 #bert #researchpaperwalkthrough
Data augmentation is a widely used technique to increase the size of the training data. It helps in significatly increasing the diversity of data available for training models resulting in reducing over fitting and enhancing robustness of ML model, without actually collecting new data. In this video we will understand how we can use Transformers to do augmentation in NLP.
⏩ Abstract: Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks, pre-trained Seq2Seq model outperforms other models. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.
⏩ OUTLINE:
0:00 - Intro and Overview
0:56 - Refresher on BERT, GPT-2, BART Models
5:14 - Problems in previous approach to Data Augmentation in NLP
6:50 - Approach
15:50 - Data set and Labels information
16:35 - My thoughts and feedback on the approach
⏩ Paper Title: Data Augmentation using Pre-trained Transformer Models
⏩ Paper: https://arxiv.org/abs/2003.02245
⏩ Author: Varun Kumar, Ashutosh Choudhary, Eunah Cho
⏩ Organisation: Alexa AI
⏩ IMPORTANT LINKS
EDA (Easy Data Augmentation) for Text in NLP - https://www.youtube.com/watch?v=-1unNLkwImw
*********************************************
⏩ Youtube - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA
⏩ Blog - https://prakhartechviz.blogspot.com
⏩ LinkedIn - https://linkedin.com/in/prakhar21
⏩ Medium - https://medium.com/@prakhar.mishra
⏩ GitHub - https://github.com/prakhar21
*********************************************
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
Tools I use for making videos :)
⏩ iPad - https://tinyurl.com/y39p6pwc
⏩ Apple Pencil - https://tinyurl.com/y5rk8txn
⏩ GoodNotes - https://tinyurl.com/y627cfsa
#techviz #datascienceguy #nlp #nlptransformers #textanalysis