Reward function specification can be difficult, even in simple environments. Realistic environments contain millions of states. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoids side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead, completes the specified task, and avoids side effects.
Speakers: Alex Turner, Neale Ratzlaff, Prasad Tadepalli