Multimodal Graph Networks for Compositional Generalization in Visual Question Answering