Exact (Then Approximate) Dynamic Programming for Deep Reinforcement Learning