Episode 17 of the Stanford MLSys Seminar Series "Taming the long tail of industrial ML applications" by Savin Goyal, software engineer @ Netflix responsible for Metaflow.
Abstract: Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery and informing buying decisions to fighting fraud. Our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workloads autonomously without the need to be significantly experienced with systems or data engineering. In this talk, I will discuss some of the challenges involved in improving the development and deployment experience for ML workloads. I will focus on Metaflow, our ML framework, which offers useful abstractions for managing the model’s lifecycle end-to-end, and how a focus on human-centric design positively affects our data scientists' velocity.
Speaker bio:
Savin is a software engineer at Netflix responsible for Metaflow, Netflix's ML platform. He focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix and beyond.
1:10 Introduction
1:47 Netflix Business Model with ML Applications
5:47 Role of Data Scientist & Challenges
15:30 Data Science Stack
18:29 Structure your code as a DAG
19:28 Starting with ML script
20:50 Develop locally like any other script
21:15 How Metaflow stores datasets
23:31 You can restart from any step
24:01 Straightforward grid search
24:50 Metaflow handles parallel execution
25:14 Offload compute to the cloud and let Metaflow to handle the details
27:50 Specify compute dependencies easily
29:28 Ready for production?
32:05 Ready for integration?
34:34 Other features of Metaflow
37:08 Questions & Answer