Authors: Changhee Won, Hochang Seok, Jongwoo Lim Description: This paper presents an upright and stabilized omnidirectional depth estimation for an arbitrarily rotated wide-baseline multi-camera inertial system. By aligning the reference rig coordinate system with the gravity direction acquired from an inertial measurement unit, we sample depth hypotheses for omnidirectional stereo matching by sweeping global spheres whose equators are parallel to the ground plane. Then, unary features extracted from each input image by 2D convolutional neural networks (CNN) are warped onto the swept spheres, and the final omnidirectional depth map is output through cost computation by a 3D CNN-based hourglass module and a softargmax operation. This can eliminate wavy or unrecognizable visual artifacts in equirectangular depth maps which can cause failures in scene understanding. We show the capability of our upright and stabilized omnidirectional depth estimation through experiments on real data.