On the linearity of large non-linear models: when and why is the tangent kernel constant

NeurIPS 2020