Monocular Instance Motion Segmentation for Autonomous Driving: KITTI InstanceMotSeg Dataset and Multi-task Baseline
Abstract: Moving object segmentation is a crucial task for autonomous vehicles as it can be used to segment objects in a class agnostic manner based on their motion cues. It enables the detection of unseen objects during training (e.g., moose or a construction truck) based on their motion and independent of their appearance. Although pixel-wise motion segmentation has been studied in autonomous driving literature, it has been rarely addressed at the instance level, which would help separate connected segments of moving objects leading to better trajectory planning. As the main issue is the lack of large public datasets, we create a new InstanceMotSeg dataset comprising of 12.9K samples improving upon our KITTIMoSeg dataset. In addition to providing instance level annotations, we have added 4 additional classes which is crucial for studying class agnostic motion segmentation. We adapt YOLACT and implement a motion-based class agnostic instance segmentation model which would act as a baseline for the dataset. We also extend it to an efficient multi-task model which additionally provides semantic instance segmentation sharing the encoder. The model then learns separate prototype coefficients within the class agnostic and semantic heads providing two independent paths of object detection for redundant safety. To obtain real-time performance, we study different efficient encoders and obtain 39 fps on a Titan Xp GPU using MobileNetV2 with an improvement of 10% mAP relative to the baseline. Our model improves the previous state of the art motion segmentation method by 3.3%. Authors: Eslam Mohamed, Mahmoud Ewaisha, Mennatullah Siam, Hazem Rashed, Senthil Yogamani, Waleed Hamdy, Muhammad Helmi, Ahmad El-Sallab (Valeo, University of Alberta, Zewail City of Science and Technology)