Abstract: Obstacle avoidance and environment sensing are crucial applications in autonomous driving and robotics. Among all types of sensors, RGB camera is widely used in these applications as it can offer rich visual contents with relatively low-cost, and using a single image to perform depth estimation has become one of the main focuses in resent research works. However, prior works usually rely on highly complicated computation and power-consuming GPU to achieve such task; therefore, we focus on developing a real-time light-weight system for depth prediction in this paper.
Based on the well-known encoder-decoder architecture, we propose a supervised learning-based CNN with detachable decoders that produce depth predictions with different scales. We also formulate a novel log-depth loss function that computes the difference of predicted depth map and ground truth depth map in log space, so as to increase the prediction accuracy for nearby locations. To train our model, we generate ground truth depth map and semantic segmentation with PSMNet and DeepLabV3, respectively, and test various pre-processing methods. Via a series of ablation studies and experiments, it is validated that our model can efficiently performs real-time depth prediction with much less parameters, with the best trained model outperforms previous works on KITTI dataset for various evaluation matrices.
Authors: Mian Jhong Chiu; Wei-Chen Chiu, Hua-Tsung Chen, Jen-Hui Chuang