The trade-off between convergence error and communication delays in decentralized stochastic gradient descent~(SGD) is dictated by the sparsity of the inter-worker communication graph. In this paper, we propose MATCHA, a decentralized SGD method where we use matching decomposition sampling of the base graph to parallelize inter-worker information exchange so as to significantly reduce communication delay. At the same time, under standard assumptions for any general topology, in spite of the significant reduction of the communication delay, MATCHA maintains the same convergence rate as that of the state-of-the-art in terms of epochs. Experiments on a suite of datasets and deep neural networks validate the theoretical analysis and demonstrate the effectiveness of the proposed scheme as far as reducing communication delays is concerned.
Speakers: Jianyu Wang