[KDD 2020] Discovering Succinct Pattern Sets Expressing Co-Occurrence and Mutual Exclusivity
Aug 13, 20203 views
Pattern mining is one of the core topics of data mining. We consider,the problem of mining a succinct set of patterns that together explain the data in terms of mutual exclusivity and co-occurence. That,is, we extend the traditional pattern languages beyond conjunctions, enabling us to capture more complex relationships, such as,replacable sub-components or antagonists in biological pathways.,We formally define this problem in terms of the Minimum Description Length principle, by which we identify the best set of,patterns as the one that most succinctly describes the data. To,avoid spurious results—in sparse data mutual exclusivity is likely,just due to chance—we propose an efficient statistical test for,K,ary mutual exclusivity. As the search space for the optimal model,is enormous and unstructured, we propose,Mexican,, a heuristic,algorithm to efficiently discover high quality sets of patterns of cooccurences and mutual exclusivity. Through extensive experiments,we show that,Mexican,recovers the ground truth on synthetic data,,and meaningful results on real-world data. Both in stark contrast,to the state of the art, that result in millions of spurious patterns.