Abstract: In this paper we consider a graph clustering problem with a given number of clusters and approximate desired sizes of the clusters. One possible motivation for such task could be the problem of databases or servers allocation within several given large computational clusters, where we want related objects to share the same cluster in order to minimize latency and transaction costs. This task differs from the original community detection problem. To solve this task, we adopt some ideas from Glauber Dynamics and Label Propagation Algorithm. At the same time we consider no additional information about node labels, so the task has the nature of unsupervised learning. We propose an algorithm for the problem, show that it works well for a large set of parameters of Stochastic Block Model (SBM) and theoretically show that its running time complexity for achieving almost exact recovery is of O(n·d·ω) for the mean-field SBM with d being the average degree and ω tending to infinity arbitrary slow. Other significant advantage of the proposed approach is its local nature, which means it can be efficiently distributed with no scheduling or synchronization.
Authors: Konstantin Avrachenkov, Maksim Mironov (INRIA)