Reproducing 150 Research Papers and Testing Them in the Real World: Challenges and Solutions logo

Reproducing 150 Research Papers and Testing Them in the Real World: Challenges and Solutions

Mar 29, 2021
ACM Tech Talk by Grigori Fursin, President of cTuning Foundation and Founder of, on February 11, 2021. ABSTRACT: After completing the MILEPOST project in 2009, I opened the portal and released into the public domain all my research code, data sets, experimental results, and Machine Learning models (ML) for our self-optimizing compiler. My goal was to continue this research and developments as a community effort while crowdsourcing ML training across diverse programs, data sets, compilers, and platforms provided by volunteers. Unfortunately, this project quickly stalled after we struggled to run experiments and reproduce results across rapidly evolving systems in the real world. This experience motivated me to introduce artifact evaluation at several ACM conferences including CGO, PPoPP, and ASPLOS and learn how to reproduce 150+ research papers. In this talk, I will present numerous challenges we faced during artifact evaluation and possible solutions. I will also describe the Collective Knowledge framework (CK) developed to automate this tedious process and bring DevOps and FAIR principles to research. The CK concept is to decompose research projects into reusable micro-services that expose characteristics, optimizations, and SW/HW dependencies of all sub-components in a unified way via a common API and extensible meta descriptions. Portable workflows assembled from such plug & play components allow researchers and practitioners to automatically build, test, benchmark, optimize, and co-design novel algorithms across continuously changing software and hardware. Furthermore, the best results can be continuously collected in public or private repositories together with negative results, unexpected behavior, and mispredictions for collaborative analysis and improvement. I will also present the platform to share portable, customizable, and reusable CK workflows from reproduced papers that can be quickly validated by the community and deployed in production. I will conclude with several practical use-cases of the CK technology to improve reproducibility in ML and Systems research and accelerate real-world deployment of efficient deep learning systems from the cloud to the edge in collaboration with General Motors, Arm, IBM, Intel, Amazon, TomTom, the Raspberry Pi foundation, ACM, MLCommons, and MLPerf. SPEAKER Grigori Fursin President, cTuning Foundation; Founder,; ACM Taskforce on Reproducibility Grigori Fursin is a computer scientist with more than 20 years of experience pioneering novel autotuning, machine learning and knowledge sharing techniques to modernize the development of efficient software and hardware. After completing a PhD in Computer Science from the University of Edinburgh, Grigori was a tech lead in the EU MILEPOST project with IBM developing the world’s first ML-based compiler, a Senior Research Scientist at INRIA, and a Co-Director of the Intel Exascale Lab. He is a recipient of the ACM CGO'17 Test of Time award, INRIA award of scientific excellence, EU HiPEAC technology transfer award, and several best paper awards. Grigori is the President of the cTuning foundation and the founder of the platform. He is an active open-source contributor, educator, and reproducibility champion, notably through his involvement in the ACM Taskforce on Reproducibility, MLCommons and artifact evaluation. He is the author of the Collective Knowledge framework to bring DevOps and FAIR principles to research with the help of portable, customizable, and reusable workflow templates, reproducible experiments, and auto-generated “live” papers. Grigori's mission is to bridge the growing gap between academic research and industry by helping researchers share their novel techniques as production-ready workflows that can be quickly validated in the real world and adopted by industry. MODERATOR Peter Mattson Google; MLCommons Peter Mattson leads ML Metrics at Google. He co-founded and is President of MLCommons, and co-founded and was General Chair of the MLPerf consortium that preceded it. Previously, he founded the Programming Systems and Applications Group at NVIDIA Research, was VP of software infrastructure for Stream Processors Inc (SPI), and was a managing engineer at Reservoir Labs. His research focuses on understanding machine learning models and data through quantitative metrics and analysis. Peter holds a PhD and MS from Stanford University and a BS from the University of Washington.

7:50 Personal motivation 14:58 MILEPOST Project: using ML to improve the system efficiency and reduce costs 19:00 Reproducibility efforts and ACM 24:09 Reproducing 150 research papers at CGO, PPoPP, ASPLOS, PACT, and MLSys 36:42 Bridging growing gap between academic research and industry with DevOps and FAIR principles 48:31 Validating research papers via open hackathons and tournaments 49:06 Validating research papers in the real world 51:58 Conclusion