Understanding Deep Learning Theory to Improve Model Interpretability - Tengyu Ma @ Stanford

NeurIPS 2019

Tengyu Ma is an Assistant Professor of Computer Science and Statistics at Stanford. His PhD dissertation at Princeton University has received honorable mention for ACM Doctoral Dissertation Award, one out of only two additional titles recognized for this remarkable distinction every year. He introduced several papers that he presented at NeurIPS 2019, the work behind the Honorable Mention ACM Award, and also shared his perspectives on the future direction of AI research. / Full Interview Transcripts / Margaret Laffan: I'm here with Dr. Ma, an Assistant Professor of Computer Science and Statistics at Stanford University. Dr. Ma, welcome. Thank you for joining us. Prof. Tengyu Ma: Thanks. I’m very happy to be here. Margaret Laffan: Can you share with us some of your research focus? Prof. Tengyu Ma: My main research focus is on the theory of machine learning, especially the theory of deep learning. We also work on how to transfer the insight we got from theory to practice, so I work on the application as well. In the last two years, I've also been exploring a new area of deep reinforcement. Margaret Laffan: You had five papers presented here at NeurIPS. Can you tell us a bit more about this research that you've been presenting here? Prof. Tengyu Ma: Some of these papers are on understanding why deep learning works. One of the papers focuses on why deep learning algorithms can generalize to unseen examples, even though the number of parameters is very big. This is a big open direction of the understanding of deep learning these days because some of the conventional theories do not apply. This is the paper titled “Data-dependent Sample Complexity of Deep Neural Networks”. There are two papers on understanding other aspects, like how do we train algorithms faster and generalize better? One paper is about understanding why the learning rate/step size in the tuning algorithms matters for generalization. And the other is about why the regularization matters and how we can use them in the best way. These three papers are basically about understanding why deep learning algorithms work and how to improve them using theoretical insights. One particular paper that I’m really excited to bring up is the one called “Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss.” It's a paper that uses theories to design practical algorithms that potentially can be used for many other real-world applications. The context is that we have a way of studying this situation with a very imbalanced dataset. In the real world, many datasets are not as nice as the benchmarking datasets we have, like ImageNet, CIFAR, and other datasets. The datasets we have in the industry are very imbalanced. In some classes, you have a lot of examples, and in some classes, you have very few examples. Taking the self-driving cars as an example: you cannot collect a lot of data for certain incidences. For example, a deer is crossing the street, this is a very rare event, so you would never have a lot of data for this kind of events. But still, you want to do very well on such safety-critical events. This is the second work we are working on. We want to make sure that the algorithm not only works for frequent events, but also for the rare events. The general idea is to design regularization to improve the generalization of rare events. Existing tools have resampling and rerating these tools have addressed the issue of training on rare events. We have got better results in training rare events. But unfortunately, we don't generalize on the rare events very well - we can overfit on the rare events but not generalize. This particular work is trying to address this issue, and we did succeed in improving a few of the simplified algorithms and getting much better on generalization results on the rare events. Margaret Laffan: You received the Honorable Mention ACM Award. Can you tell us more about why you received this? Prof. Tengyu Ma: The title of the award-winning thesis is “Non-convex Optimization for Machine Learning: Design, Analysis, and Understanding”. It's about understanding non-convex optimization. At the beginning of my Ph.D. program in 2012 or 2013, deep learning was just about to take off. We realized that deep learning will be the next big thing. One of the bottlenecks of understanding deep learning is that we optimized the loss function for deep learning to be non-convex. But most of the existing optimization theories focused on convex optimization, which doesn't necessarily apply to deep learning. Basically, we started this program to understand why we can optimize the non-convex function, which was believed to be relatively hard. And we characterized different properties of the loss functions we used in practice. These nice properties can lead us to design new algorithms or analyze the existing algorithms beyond just a convex setting. That's the main part of the paper. We also analyzed the modern perspective of non-convex models. When you have a linear model, the interpretation of the model is relatively easy, but we have a nonlinear model, a non-convex model, then the interpretation becomes difficult. So the second part of the thesis is about how do we interpret some of these non-convex models, especially in the context of NLP. Margaret Laffan: At NeurIPS this week, there's a tremendous number of papers being presented and a lot of talks and so forth. What excites you most around the direction that AI research is headed in now? Prof. Tengyu Ma: At least half of me is theorization. I'm very excited that people are thinking more and more about the reliability and interpretability of these different models. Maybe I'm biased, but I feel like there are more and more papers working on understanding the theory of deep learning, and making deep learning more interpretable and more reliable. I think this is probably going to be the next big thing in the future because we are getting into a situation where all of these become important concerns, especially for social impact and so forth. Margaret Laffan: Where do you see the next major breakthroughs in AI research in the 2020s? Prof. Tengyu Ma: There’re different areas in AI. I can only speak about things that are related to me. I would say, in the near future, we can understand the generalization of deep learning much better, at least in the relatively standard settings. We have to care about a lot of different settings these days because we want to apply these models to the real world. I think in the near future, we are getting closer to understand the standard settings of why generalization works, why optimization works. In the long run, because people are more and more focusing on the understanding of the reliability of deep learning, I feel like this is a very big area in the future. It's not necessarily the breakthroughs in the forward direction, rather, we need to cover our back in the sense that we have to make sure that all of our algorithms are safe, interpretable, usable and reliable in real life. This is something I'm very excited about. Margaret Laffan: Dr. Ma, thank you very much for joining us today.