Efficient Knowledge Graph Validation via Cross-Graph Representation Learning

CIKM 2020

Abstract: Recent advances in information extraction have motivated the automatic construction of huge Knowledge Graphs (KGs) by mining from large-scale text corpus. However, noisy facts are unavoidably introduced into KGs that could be caused by automatic extraction. To validate the correctness of facts (i.e., triplets) inside a KG, one possible approach is to map the triplets into vector representations by capturing the semantic meanings of facts. Although many representation learning approaches have been developed for knowledge graphs, these methods are not effective for validation. They usually assume that facts are correct, and thus may overfit noisy facts and fail to detect such facts. Towards effective KG validation, we propose to leverage an external human-curated KG as auxiliary information source to help detect the errors in a target KG. The external KG is built upon human-curated knowledge repositories and tends to have high precision. On the other hand, although the target KG built by information extraction from texts has low precision, it can cover new or domain-specific facts that are not in any human-curated repositories. To tackle this challenging task, we propose a cross-graph representation learning framework, i.e., CrossVal, which can leverage an external KG to validate the facts in the target KG efficiently. This is achieved by embedding triplets based on their semantic meanings, drawing cross-KG negative samples and estimating a confidence score for each triplet based on its degree of correctness. We evaluate the proposed framework on datasets across different domains. Experimental results show that the proposed framework achieves the best performance compared with the state-of-the-art methods on large-scale KGs.