Abstract: Location awareness is a fundamental need for intelligent systems, such as self-driving vehicles, delivery drones, and mobile devices. Given their on-board sensors (e.g. camera, inertial sensor and LIDAR), previous researchers have developed a variety of localization systems, by building hand-crafted models and algorithms. Under ideal conditions, these sensors and models are able to accurately estimate system states without time bound. However, in real-world environments, many issues such as imperfect sensor measurements, inaccurate system modelling, complex environmental dynamics and unrealistic constraints, degrade the accuracy and reliability of localization systems. Therefore, this thesis aims to leverage machine learning approaches to overcome the intrinsic problems of the human-designed localization models.
This research presents learning methods to estimate self-motion using multimodal sensor data to achieve accurate and robust localization. Firstly, we exploit inertial sensor, a completely ego-centric and relatively robust sensor, to develop Inertial Odometry Neural Network (IONet) that learns motion transformation from raw inertial data, and reconstructs accurate trajectories. This inertial only solution shows impressive performance in locating people and wheeled objects without being influenced by environmental issues. IONet was further refined as L-IONet, a lightweight framework, to reduce the computational burden of model training and testing, and enable real-time inference on low-end devices. As a first trial in this direction, we collected and released Oxford Inertial Odometry Dataset (OxIOD) with a very large amount of inertial motion data collection containing 158 sequences totalling 42 km, to train and comprehensively evaluate our proposed models.
Secondly, we present a novel generic framework to learn selective sensor fusion in enabling more robust and accurate odometry estimation and localization in real-world scenarios. Two fusion strategies are proposed: soft fusion, implemented in a deterministic fashion; and hard fusion, which introduces stochastic noise and intuitively learns to keep the most relevant feature representations, while discarding useless or misleading information. Both are trained in an end-to-end fashion, and can be applied to complimentary pairs of sensor modalities, e.g. RGB images, inertial measurements, depth images, and LIDAR point clouds. We offer a visualization and interpretation of fusion masks to give deeper insights into the relative strengths of each stream.
Finally, we leverage deep generative models to propose Sequential Invariant Domain Adaptation (SIDA) to mitigate the domain shift problem of the deep neural network based localization models. This framework works well on long continuous sensor data. Its key novelty is to use a shared encoder to convert the input sequence into a domain-invariant hidden representation, to encourage the useful semantic features obtained, whilst discarding the domain specific features. We employ proposed SIDA on deep learning based inertial odometry and human activity recognition to demonstrate its effectiveness in improving the generalization ability in new domains. We show that SIDA is able to transform raw sensor data into an accurate trajectory in new unlabelled domains, benefiting from the knowledge transferred from the labelled source domain. Through extensive experiments, all our proposed methods demonstrate their effectiveness and potential in achieving accurate and robust localization in real-world environments.
Authors: Changhao Chen (University of Oxford)