Abstract: Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training. However, their evaluation has focused on favorable conditions, using comparable corpora or closely-related languages, and we show that they often fail in more re-alistic scenarios. This work proposes an alternative approach based on a fully unsupervised initialization that ex-plicitly exploits the structural similarity of the embeddings, and a robust self-learning algorithm that iteratively im-proves this solution. Our method succeeds in all tested scenarios and obtains the best published results in standard datasets, even surpassing previous supervised systems.
Authors: Anders Søgaard, Sebastian Ruder, Ivan Vulić (University of Copenhagen, National University of Ireland, University of Cambridge)