Authors: Lei Zhao, Qihang Mo, Sihuan Lin, Zhizhong Wang, Zhiwen Zuo, Haibo Chen, Wei Xing, Dongming Lu Description: Although existing image inpainting approaches have been able to produce visually realistic and semantically correct results, they produce only one result for each masked input. In order to produce multiple and diverse reasonable solutions, we present Unsupervised Cross-space Translation Generative Adversarial Network (called UCTGAN) which mainly consists of three network modules: conditional encoder module, manifold projection module and generation module. The manifold projection module and the generation module are combined to learn one-to-one image mapping between two spaces in an unsupervised way by projecting instance image space and conditional completion image space into common low-dimensional manifold space, which can greatly improve the diversity of the repaired samples. For understanding of global information, we also introduce a new cross semantic attention layer that exploits the long-range dependencies between the known parts and the completed parts, which can improve realism and appearance consistency of repaired samples. Extensive experiments on various datasets such as CelebA-HQ, Places2, Paris Street View and ImageNet clearly demonstrate that our method not only generates diverse inpainting solutions from the same image to be repaired, but also has high image quality.