Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Superision

ICML 2020