Disentangling Trainability and Generalization in Deep Neural Networks