[KDD 2020] GHashing: Semantic Graph Hashing for Approximate Similarity Search in Graph Databases
CrossMind.ai logo

[KDD 2020] GHashing: Semantic Graph Hashing for Approximate Similarity Search in Graph Databases

Dec 16, 2020
Graph similarity search aims to find the most similar graphs to,a query in a graph database in terms of a given proximity measure, say Graph Edit Distance (GED). It is a widely studied yet still,challenging problem. Most of the studies are based on the pruningverification framework, which first prunes non-promising graphs,and then conducts verification on the small candidate set. Existing,methods are capable of managing databases with thousands or tens,of thousands of graphs, but fail to scale to even larger databases,,due to their exact pruning strategy. Inspired by the recent success,of deep-learning-based semantic hashing in image and document,retrieval, we propose a novel graph neural network (GNN) based,semantic hashing, i.e.,GHashing,, for approximate pruning. We,first train a GNN with ground-truth GED results so that it learns to,generate embeddings and hash codes that preserve GED between,graphs. Then a hash index is built to enable graph lookup in constant,time. To answer a query, we use the hash codes and the continuous,embeddings as two-level pruning to retrieve the most promising,candidates, which are sent to the exact solver for final verification.,Due to the approximate pruning strategy leveraged by our graph,hashing technique, our approach achieves significantly faster query,time compared to state-of-the-art methods while maintaining a high,recall. Experiments show that our approach is on average,20,×,faster,than the only baseline that works on million-scale databases, which,demonstrates GHashing successfully provides a new direction in,addressing graph search problem for large-scale graph databases.