[KDD 2020] Stable Learning via Differentiated Variable Decorrelation
Aug 13, 202013 views
Recently, as the applications of artificial intelligence gradually seeping into some risk-sensitive areas such as justice, healthcare and,autonomous driving, an upsurge of research interest on model stability and robustness has arisen in the field of machine learning.,Rather than purely fitting the observed training data, stable learning tries to learn a model with uniformly good performance under,non-stationary and agnostic testing data. The key challenge of stable learning in practice is that we do not have any knowledge about,the true model and test data distribution as a priori. Under such,condition, we cannot expect a faithful estimation of model parameters and its stability over wild changing environments. Previous,methods resort to a reweighting scheme to remove the correlations,between all the variables through a set of new sample weights. However, we argue that such aggressive decorrelation between all the,variables may cause the over-reduced sample size, which leads to,the variance inflation and possible underperformance. In this paper,,we incorporate the unlabled data from multiple environments into,the variable decorrelation framework and propose a Differentiated,Variable Decorrelation (DVD) algorithm based on the clustering,of variables. Specifically, the variables are clustered according to,the stability of their correlations and the variable decorrelation,module learns a set of sample weights to remove the correlations,merely between the variables of different clusters. Empirical studies,on both synthetic and real world datasets clearly demonstrate the,efficacy of our DVD algorithm on improving the model parameter,estimation and the prediction stability over changing distributions.