Authors: Amir R. Zamir, Alexander Sax, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas J. Guibas Description: Visual perception entails solving a wide set of tasks (e.g., object detection, depth estimation, etc). The predictions made for different tasks out of one image are not independent, and therefore, are expected to be 'consistent'. We propose a flexible and fully computational framework for learning while enforcing Cross-Task Consistency (X-TAC). The proposed formulation is based on 'inference path invariance' over an arbitrary graph of prediction domains. We observe that learning with cross-task consistency leads to more accurate predictions, better generalization to out-of-distribution samples, and improved sample efficiency. This framework also leads to a powerful unsupervised quantity, called 'Consistency Energy, based on measuring the intrinsic consistency of the system. Consistency Energy well correlates with the supervised error (r=0.67), thus it can be employed as an unsupervised robustness metric as well as for detection of out-of-distribution inputs (AUC=0.99). The evaluations were performed on multiple datasets, including Taskonomy, Replica, CocoDoom, and ApolloScape.
Full Paper Link: https://arxiv.org/abs/2006.04096