Authors: Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu Description: We deal with the problem of zero-shot cross-modal image retrieval involving color and sketch images through a novel deep representation learning technique. The problem of a sketch to image retrieval and vice-versa is of practical importance, and a trained model in this respect is expected to generalize beyond the training classes, e.g., the zero-shot learning scenario. Nonetheless, considering the drastic distributions-gap between both the modalities, a feature alignment is necessary to learn a shared feature space where retrieval can efficiently be carried out. Additionally, it should also be guaranteed that the shared space is semantically meaningful in order to aid in the zero-shot retrieval task. The very few existing techniques for zero-shot sketch-RGB image retrieval deploy the deep generative models for learning the embedding space; however, training a typical GAN like model for multi-modal image data may be non-trivial at times. To this end, we propose a multi-stream encoder-decoder model that simultaneously ensures improved mapping between the RGB and sketch image spaces and high discrimination in the shared semantics-driven encoded feature space. Further, it is guaranteed that the class topology of the original semantic space is preserved in the encoded feature space, which subsequently reduces the model bias towards the training classes. Experimental results obtained on the benchmark Sketchy and TU-Berlin datasets establish the efficacy of our model as we outperform the existing state-of-the-art techniques by a considerable margin.