[KDD 2020] Meta-Learning for Query Conceptualization at Web Scale - CrossMinds.ai
[KDD 2020] Meta-Learning for Query Conceptualization at Web Scale
Aug 13, 20205 views
Di Niu
Concepts naturally constitute an abstraction for fine-grained entities and knowledge in the open domain. They enable search engines,and recommendation systems to enhance user experience by discovering high-level abstraction of a search query and the user intent,behind it. In this paper, we study the problem of query conceptualization, which is to find the most appropriate matching concepts,for any given search query from a large pool of pre-defined concepts. We propose a coarse-to-fine approach to first reduce the,search space for each query through a shortlisting scheme and then,identify the matching concepts using pre-trained language models,,which are meta-tuned to our query-concept matching task. Our,shortlisting scheme involves using a GRU-based Relevant Words,Generator (RWG) to first expand and complete the context of the,given query and then shortlisting the candidate concepts through a,scoring mechanism based on word overlaps. To accurately identify,the most appropriate matching concepts for a query, even when,the concepts may have zero verbatim overlaps with the query, we,meta-fine-tune a BERT pairwise text-matching model under the,Reptile meta-learning algorithm, which achieves zero-shot transfer,learning on the conceptualization problem. Our two-stage framework can be trained with data completely derived from a search,click graph, without requiring any human labelling efforts. For,evaluation, we have constructed a large click graph based on more,than,7,million instances of the click history recorded in Tencent QQ,browser and performed the query conceptualization task based on a,large ontology with,159,,,148,unique concepts. Results from a range,of evaluation methods, including an offline evaluation procedure,on the click graph, human evaluation, online A/B testing and case,studies, have demonstrated the superiority of our approach over a,number of competitive pre-trained language models and fine-tuned,neural network baselines.