Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation

ACL 2020