Maximum Mutation Reinforcement Learning for Scalable Control.
Advances in Reinforcement Learning (RL) have successfully tackled sample efficiency and overestimation bias. However, these methods often fall short of scalable performance. On the other hand, genetic methods provide scalability but depict hyperparameter sensitivity to evolutionary operations. We present the Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm. Our contributions are threefold; ESAC (1) abstracts exploration from exploitation by combining Evolution Strategies (ES) with Soft Actor-Critic (SAC), (2) provides dominant skill transfer between offsprings by making use of soft winner selections and genetic crossovers in hindsight and (3) improves hyperparameter sensitivity in evolutions using Automatic Mutation Tuning (AMT). AMT gradually replaces the entropy framework of SAC allowing the population to succeed at the task while acting as randomly as possible, without making use of backpropagation updates. On a range of challenging control tasks consisting of high-dimensional action spaces and sparse rewards, ESAC demonstrates state-of-the-art performance and sample efficiency equivalent to SAC. ESAC demonstrates scalability comparable to ES on the basis of hardware resources and algorithm overhead.