In modern visual SLAM systems, it is a standard practice to retrieve potential candidate map points from overlapping keyframes for further feature matching or direct tracking. In this work, we argue that keyframes are not the optimal choice for this task, due to several inherent limitations, such as weak geometric reasoning and poor scalability. We propose a voxel-map representation to efficiently retrieve map points for visual SLAM. In particular, we organize the map points in a regular voxel grid. Visible points from a camera pose are queried by sampling the camera frustum in a raycasting manner, which can be done in constant time using an efficient voxel hashing method. Compared with keyframes, the retrieved points using our method are geometrically guaranteed to fall in the camera field-of-view, and occluded points can be identified and removed to a certain extend. This method also naturally scales up to large scenes and complicated multi-camera configurations. Experimental results show that our voxel map representation is as efficient as a keyframe map with 5 keyframes and provides significantly higher localization accuracy (average 46% improvement in RMSE) on the EuRoC dataset. The proposed voxel-map representation is a general approach to a fundamental functionality in visual SLAM and is widely applicable.
Manasi Muglikar, Zichao Zhang, Davide Scaramuzza.Voxel Map for Visual SLAM
IEEE International Conference on Robotics and Automation (ICRA), Paris, 2020.
Our research on visual(-inertial) odometry and SLAM: http://rpg.ifi.uzh.ch/research_vo.html
More about our research: http://rpg.ifi.uzh.ch/publications.html
Affiliations: The authors are with the Robotics and Perception Group, Dep. of informatics, University of Zurich, and Dep. of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland http://rpg.ifi.uzh.ch/