This episode is an interview with Vincent Sitzmann from Stanford University, discussing highlights from his paper, "Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations," accepted as an oral presentation at NeurIPS 2019 conference.
Vincent is a fourth-year Ph.D. student in the Stanford Computational Imaging Laboratory, advised by Prof. Gordon Wetzstein. His research interest lies in 3D-structure-aware neural scene representations - a novel way for AI to represent information on our 3D world. His goal is to allow AI to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI.
Paper At A Glance: The advent of deep learning has given rise to neural scene representations - learned mathematical models of a 3D environment. However, many of these representations do not explicitly reason about geometry and thus do not account for the underlying 3D structure of the scene. In contrast, geometric deep learning has explored 3D-structure-aware representations of scene geometry, but requires explicit 3D supervision. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D observations, without access to depth or geometry. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.