Poster: A Policy Gradient Method for Task-Agnostic Exploration