This episode is a live recording of our interview with Matteo Poggi at the CVPR 2019 conference. Matteo Poggi is a postdoctoral researcher for the Department of Computer Science and Engineering at the University of Bologna in Italy. He served as a reviewer for CVPR this year.
Poggi also presented a tutorial titled "Learning-based depth estimation from stereo and monocular images: successes, limitations and future challenges". His presentation focused applying machine learning techniques to 3D sensing. During the interview, he shared key applications of his tutorial, challenges he currently faces and the difference between research in the US and Italy.
/ Full Interview Transcripts /
Wenli: We have Matteo here with us. He’s from Italy. He is one of the organizers of a tutorial. The tutorial he co-organized is “Learning-based depth estimation from stereo and monocular images.” Tell us about yourself.
Matteo Poggi: I'm a postdoc researcher in the University of Bologna in Italy. I got my PhD one year ago, and now I'm working on computer vision and mainly on the estimation from stereo or monocular images, mostly using machine learning and deep learning techniques.
Wenli: Can you tell us a little more about the tutorial that you co-organized? And also how did you organize it? Any highlights you would like to share with us?
Matteo Poggi: We started the project of this tutorial about six months ago, more or less, and we tried to propose this tutorial at a smaller conference which is 3DV. It was hosted in Verona, Italy last year, and I was in touch with Philippos Mordohai from Stevens Institute of Technology at New Jersey. And I worked with my research group, which is made by Fabio Tosi and Stefano Mattoccia. Stefano is my supervisor since my PhD and now I still work in his group, and we decided to work together with Philippos and Konstantinos, which is a PhD student in his group, because of their expertise in this field. Stefano and Philippos were working a lot on the depth estimation from stereo images last [few] years.
And we found that we were in a great time to propose an overview of what happened in this topic, particularly in the last three to five years because in these last [few] years, the development of machine learning and deep learning changed many things in this topic. We decided it was a good time to provide an overview about this, and we had very positive feedback from the 3DV Tutorial. We decided to try this jump at CVPR. The tutorial was accepted, so here we are.
Wenli: What is it mainly about?
Matteo Poggi: It is mainly about the machine learning techniques applied to 3D sensing, particularly depth measurement from [a] single or a couple of images. In particular, it highlights in the last three years how we moved from the hand design approach, where the developers were just tuning the parameters and experimenting and tuning the algorithms to [a] machine learning-based approach on which the algorithms just learn from data, which is the best way to approach the problem.
Wenli: So what would be the takeaway for our junior researchers and senior researchers?
Matteo Poggi: Starting from junior researchers, we tried to organize the tutorial in a way that they could see which [were] the state-of-the-art techniques prior to these advances on machine learning and deep learning, which were the ideas that we followed up to three years ago. We tried to guide them through the novelties of this year, despite the different approaches because nowadays, we use deep learning, while five years ago, we didn’t have deep learning. [We] highlight [how] our design choices of the machine learning techniques are still somehow inspired by what we did three years ago because for instance, the geometry in stereo matching is still used when deploying the machine learning, when designing a new architecture, when training a new architecture. I think this is particularly important for new researchers, just to let them know which ideas in the past were successful, so they may be successful again, bringing them to a deep learning-based approach.
For senior researchers this can be a great opportunity to get [updated] data from the last five years of progress in this field in a couple of hours. Just adding an overview of how many works people are proposing this year that [there] are plenty compared to what happened before. Because we also have an increasing number of people and research activities moving to this topic, so to make them aware of what's happening and maybe make them more curious about the topic
Wenli: For the topic that you're working on, “Learning-based depth estimation for stereo and monocular images”, what are some of the business application that we will see in the future?
Matteo Poggi: For example, applications based on augmented reality can benefit a lot from these techniques, in particular from techniques using a single image. Because any smartphone so far has at least one camera,if we can estimate or [meet] a rough reconstruction of the]scene, then we can do whatever we want. We can make an object appear on the table, we can make some animals walking around and disappearing behind the sofa because we know the sofa is here, but the background is farther. While for other applications like autonomous driving or autonomous navigation, we can leverage on stereo matching which at this point is a pretty mature technique. Using machine learning techniques can provide a much more robust solution that can somehow scale to many, many environments.
Wenli: What are some of the challenges that you’re facing right now?
Matteo Poggi: One of the main challenges is, of course, bringing all of these works outside of academia. For instance, every time we have a new idea, a new system using their images or monocular images, many experiments are carried out but on limited amount of data with respect to any possible scenario that we would meet in reality. One of the main challenges would be to bring these approaches working everywhere without any constraints about the environment because if we want to have a system that can be deployed and work properly, we need to be sure that it can work everywhere.
Wenli: Why there is a gap between academia and industry? Is it because there’s not enough data in academia?
Matteo Poggi: I think data is one of the most important problems because even if we imagine to acquire millions of millions of images, we will never be able to see everything from outside. Our systems need to be robust to any unexpected situation, so this is one of the most challenging parts of the work.
Wenli: What are the most impressive breakthroughs in the past years?
Matteo Poggi: I think in the last three years, the most impressive progress has been carried out on single image depth estimation by estimating depth of the scene out of a single image. Five years ago, we had some solutions for this, but the results were extremely far from what we have now. Now thanks to deep learning, we had an incredible boost to these approaches and some of the results that we are able to collect right now were unthinkable just five years ago. Particularly for some of these solutions, despite it may be a little early to deploy a solution like this to work everywhere, if you deploy some of these on very constrained environments, they can work extremely well.
Wenli: You are right now a postdoc in University of Bologna in Italy. You travel very often and you attend those conferences. What are some differences in the research in Italy compared to the research in the US?
Matteo Poggi: In my personal experience, initially I worked with a quite small group of researchers. We had about four people. Compared to many research groups that I've seen so far across Europe and the US, four people are an extremely tiny group. What I see here is that many groups have plenty of people. It's exciting to work with so many people, in particular[ly] people with many different backgrounds, because they can change your point of view, which I think is one of the most important things for someone who does research.
On the other side, I also like to work in a small group, like my group, because it's easier to know each other. It’s like we are more like friends than co-workers, so it's something that can help to collaborate and also help to converge our ideas.
Wenli: It is easier and faster to change.
Matteo Poggi: It makes it easier to work out with a solution.
Wenli: Is this the first CVPR you attended?
Matteo Poggi: No, it’s the second. I also attended CVPR 2017 in Honolulu.
Wenli: What are the differences you noticed between the two years of conferences?
Matteo Poggi: The number of people.
Wenli: Yeah, it’s growing fast.
Matteo Poggi: Two years ago we had about 4000 people.
Wenli: This year it’s 9000.
Matteo Poggi: Yeah. We can see the difference. But I think it’s a good metric about the health of the community because our research area is growing year by year. It’s a good thing.
Wenli: Well, thank you so much for joining us here, Matteo.
Matteo Poggi: Thank you.