A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks

ACL 2020