How Netflix's Machine Learning System Metaflow Works | Savin Goyal @ Netflix | Stanford MLSys Seminar logo

How Netflix's Machine Learning System Metaflow Works | Savin Goyal @ Netflix | Stanford MLSys Seminar

Mar 18, 2021
Episode 17 of the Stanford MLSys Seminar Series "Taming the long tail of industrial ML applications" by Savin Goyal, software engineer @ Netflix responsible for Metaflow. Abstract: Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery and informing buying decisions to fighting fraud. Our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workloads autonomously without the need to be significantly experienced with systems or data engineering. In this talk, I will discuss some of the challenges involved in improving the development and deployment experience for ML workloads. I will focus on Metaflow, our ML framework, which offers useful abstractions for managing the model’s lifecycle end-to-end, and how a focus on human-centric design positively affects our data scientists' velocity. Speaker bio: Savin is a software engineer at Netflix responsible for Metaflow, Netflix's ML platform. He focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix and beyond.

1:10 Introduction 1:47 Netflix Business Model with ML Applications 5:47 Role of Data Scientist & Challenges 15:30 Data Science Stack 18:29 Structure your code as a DAG 19:28 Starting with ML script 20:50 Develop locally like any other script 21:15 How Metaflow stores datasets 23:31 You can restart from any step 24:01 Straightforward grid search 24:50 Metaflow handles parallel execution 25:14 Offload compute to the cloud and let Metaflow to handle the details 27:50 Specify compute dependencies easily 29:28 Ready for production? 32:05 Ready for integration? 34:34 Other features of Metaflow 37:08 Questions & Answer