Language is designed to convey useful information about the world, thus serving as a scaffold for efficient human learning. How can we let language guide representation learning in machine learning models? We explore this question in the setting of few-shot visual classification, proposing models which learn to perform visual classification while jointly predicting natural language task descriptions at train time. At test time, with no language available, we find that these language-influenced visual representations are more generalizable, compared to meta-learning baselines and approaches that explicitly use language as a bottleneck for classification.
Speakers: Jesse Mu, Percy Liang, Noah Goodman