O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

NeurIPS 2020