Authors: Runwu Zhou, Yi-Dong Shen Description: Multi-modal clustering aims to cluster data into different groups by exploring complementary information from multiple modalities or views. Little work learns the deep fused representations and simutaneously discovers the cluster structure with a discriminative loss. In this paper, we present an End-to-end Adversarial-attention network for Multi-modal Clustering (EAMC), where adversarial learning and attention mechanism are leveraged to align the latent feature distributions and quantify the importance of modalities respectively. To benefit from the joint training, we introducea divergence-based clustering objective that not only encourages the separation and compactness of the clusters but also enjoy a clear cluster structure by embedding the simplex geometry of the output space into the loss. The proposed network consists of modality-specific feature learning, modality fusion and cluster assignment three modules. It can be trained from scratch with batch-mode based optimization and avoid an autoencoder pretraining stage. Comprehensive experiments conducted on five real-world datasets show the superiority and effectiveness of the proposed clustering method.