SIGIR 2020 Presentation - Policy-Aware Unbiased Learning to Rank for Top-k Rankings -
SIGIR 2020 Presentation - Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Aug 04, 202012 views
Harrie Oosterhuis
The SIGIR'20 pre-recorded presentation for our full paper:
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
Harrie Oosterhuis and Maarten de Rijke

Preprint available here:

Follow us on twitter: and

Slides available at

Paper abstract:
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.

Video references:
A. Agarwal, X. Wang, C. Li, M. Bendersky, and M. Najork. Addressing trust bias for unbiased learning-to-rank. In The World Wide Web Conference, pages 4–14. ACM, 2019.
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 87–94. ACM, 2008.
T. Joachims, A. Swaminathan, and T. Schnabel. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 781–789. ACM, 2017.
Z. Ovaisi, R. Ahsan, Y. Zhang, K. Vasilaky, and E. Zheleva. Correcting for selection bias in learning-to-rank systems. In Proceedings of The Web Conference 2020, pages 1863–1873, 2020.
X. Wang, M. Bendersky, D. Metzler, and M. Najork. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 115–124. ACM, 2016.
X. Wang, C. Li, N. Golbandi, M. Bendersky, and M. Najork. The lambdaloss framework for ranking metric optimization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1313–1322. ACM, 2018.
SIGIR 2020