This episode is an interview with Harm van Seijan from Microsoft Research Montreal, discussing highlights from his paper, "Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning," accepted as an oral presentation at NeurIPS 2019 Conference.
Harm is the team lead of the Reinforcement Learning team at Microsoft Research Montréal, which focuses on fundamental challenges in reinforcement learning. Areas of research within reinforcement learning that he's currently interested in are transfer learning, continual learning, hierarchical approaches, and multi-agent systems. Harm did his PhD at the University of Amsterdam, under the supervision of Frans Groen and Shimon Whiteson. After my PhD, harmworked for 4 years as a postdoc in the RLAI group at the University of Alberta, working together with Richard Sutton on novel reinforcement-learning methods.
In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.