Unit Test Case Generation with Transformers (Research Paper Walkthrough)
Details
#unittest #transformers #software
Unit test is a way of testing the smallest piece of code that can be logically isolated in a system. This paper from Microsoft introduces a technique to automatically generate unit test cases using transformers model. They also release large scale unit test-focal method training pair for public use. They extensively evaluate their system and found it to be surpassing the performance of existing systems.
⏩ Abstract: Automated Unit Test Case generation has been the focus of extensive literature within the research community. Existing approaches are usually guided by the test coverage criteria, generating synthetic test cases that are often difficult to read or understand for developers. In this paper we propose AthenaTest, an approach that aims at generating unit test cases by learning from real-world, developer-written test cases. Our approach relies on a state-of-the-art sequence-to-sequence transformer model which is able to write useful test cases for a given method under test (i.e., focal method). We also introduce methods2test - the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java, which comprises 630k test cases mined from 70k open-source repositories hosted on GitHub. We use this dataset to train a transformer model to translate focal methods into the corresponding test cases. We evaluate the ability of our model in generating test cases using natural language processing as well as code-specific criteria. First, we assess the quality of the translation compared to the target test case, then we analyze properties of the test case such as syntactic correctness and number and variety of testing APIs (e.g., asserts). We execute the test cases, collect test coverage information, and compare them with test cases generated by EvoSuite and GPT-3. Finally, we survey professional developers on their preference in terms of readability, understandability, and testing effectiveness of the generated test cases.
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
⏩ OUTLINE:
0:00 - Overview
01:03 - Abstract
02:11 - Data Collection approach
03:30 - AthenaTest Pictorial Overview
04:41 - BART Transformers in a Nutshell
06:42 - Pre-training
07:48 - Fine-Tuning
08:39 - Pre-training model variants
10:03 - Can our models learn to generate test cases?
11:04 - What is the quality of generated test cases?
13:18 - How does our approach compare to EvoSuite and GPT-3?
14:16 - Do developers prefer AthenaTest’s test over EvoSuite?
⏩ Paper Title: Unit Test Case Generation with Transformers
⏩ Paper: https://arxiv.org/abs/2009.05617
⏩ Author: Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, Neel Sundaresan
⏩ Organisation: Microsoft
⏩ IMPORTANT LINKS
Full Playlist on BERT usecases in NLP: https://www.youtube.com/watch?v=kC5kP1dPAzc&list=PLsAqq9lZFOtV8jYq3JlkqPQUN5QxcWq0f
Full Playlist on Text Data Augmentation Techniques: https://www.youtube.com/watch?v=9O9scQb4sNo&list=PLsAqq9lZFOtUg63g_95OuV-R2GhV1UiIZ
Full Playlist on Text Summarization: https://www.youtube.com/watch?v=kC5kP1dPAzc&list=PLsAqq9lZFOtV8jYq3JlkqPQUN5QxcWq0f
Full Playlist on Machine Learning with Graphs: https://www.youtube.com/watch?v=-uJL_ANy1jc&list=PLsAqq9lZFOtU7tT6mDXX_fhv1R1-jGiYf
Full Playlist on Evaluating NLG Systems: https://www.youtube.com/watch?v=-CIlz-5um7U&list=PLsAqq9lZFOtXlzg5RNyV00ueE89PwnCbu
*********************************************
⏩ Youtube - https://www.youtube.com/c/TechVizTheDataScienceGuy
⏩ Blog - https://prakhartechviz.blogspot.com
⏩ LinkedIn - https://linkedin.com/in/prakhar21
⏩ Medium - https://medium.com/@prakhar.mishra
⏩ GitHub - https://github.com/prakhar21
⏩ Twitter - https://twitter.com/rattller
*********************************************
Please feel free to share out the content and subscribe to my channel :)
⏩ Subscribe - https://youtube.com/channel/UCoz8NrwgL7U9535VNc0mRPA?sub_confirmation=1
Tools I use for making videos :)
⏩ iPad - https://tinyurl.com/y39p6pwc
⏩ Apple Pencil - https://tinyurl.com/y5rk8txn
⏩ GoodNotes - https://tinyurl.com/y627cfsa
#techviz #datascienceguy #ai #researchpaper #naturallanguageprocessing #bart
0:00 Overview 01:03 Abstract 02:11 Data Collection approach 03:30 AthenaTest Pictorial Overview 04:41 BART Transformers in a Nutshell 06:42 Pre-training 07:48 Fine-Tuning 08:39 Pre-training model variants 10:03 Can our models learn to generate test cases? 11:04 What is the quality of generated test cases? 13:18 How does our approach compare to EvoSuite and GPT-3? 14:16 Do developers prefer AthenaTest’s test over EvoSuite?
0:00 Overview 01:03 Abstract 02:11 Data Collection approach 03:30 AthenaTest Pictorial Overview 04:41 BART Transformers in a Nutshell 06:42 Pre-training 07:48 Fine-Tuning 08:39 Pre-training model variants 10:03 Can our models learn to generate test cases? 11:04 What is the quality of generated test cases? 13:18 How does our approach compare to EvoSuite and GPT-3? 14:16 Do developers prefer AthenaTest’s test over EvoSuite?
Comments
loading...
Reactions (0) | Note
📝 No reactions yet
Be the first one to share your thoughts!
Reactions(0)
Note
loading...