[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Aug 13, 202031 views
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive,text corpora. To account for potential hierarchical topic structures,,hierarchical topic models generalize at topic models by incorporating latent topic hierarchies into their generative modeling process.,However, due to their purely unsupervised nature, the learned topic,hierarchy often deviates from users’ particular needs or interests.,To guide the hierarchical topic discovery process with minimal user,supervision, we propose a new task, Hierarchical Topic Mining,,which takes a category tree described by category names only, and,aims to mine a set of representative terms for each category from,a text corpus to help a user comprehend his/her interested topics.,We develop a novel joint tree and text embedding method along,with a principled optimization procedure that allows simultaneous,modeling of the category tree structure and the corpus generative,process in the spherical space for effective category-representative,term discovery. Our comprehensive experiments show that our,model, named,JoSH,, mines a high-quality set of hierarchical topics,with high eciency and benets weakly-supervised hierarchical,text classication tasks.