Designing Precise and Robust Dialogue Response Evaluators