[KDD 2020] Discovering Functional Dependencies from Mixed-Type Data
Aug 13, 20203 views
Given complex data collections, practitioners can perform nonparametric functional dependency discovery (FDD) to uncover,relationships between variables that were previously unknown.,However, known FDD methods are applicable to nominal data,,and in practice non-nominal variables are discretized, e.g., in a,pre-processing step. This is problematic because, as soon as a mix,of discrete and continuous variables is involved, the interaction,of discretization with the various dependency measures from the,literature is poorly understood. In particular, it is unclear whether a,given discretization method even leads to a consistent dependency,estimate. In this paper, we analyze these fundamental questions and,derive formal criteria as to when a discretization process applied,to a mixed set of random variables leads to consistent estimates of,mutual information. With these insights, we derive an estimator,framework applicable to any task that involves estimating mutual,information from multivariate and mixed-type data. Last, we extend with this framework a previously proposed FDD approach,for reliable dependencies. Experimental evaluation shows that the,derived reliable estimator is both computationally and statistically,efficient, and leads to effective FDD algorithms for mixed-type data.