Abstract: Metric validation in Grammatical Error Correction (GEC) is currently done by observing the correlation between hu-man and metric-induced rankings. However, such correlation studies are costly, methodologically troublesome, and suffer from low inter-rater agreement. We propose maege, an automatic methodology for GEC metric validation, that overcomes many of the difficulties in the existing methodology. Experiments with maege shed a new light on metric quality, showing for example that the standard M2 metric fares poorly on corpus-level ranking. Moreover, we use maege to perform a detailed analysis of metric behavior, showing that some types of valid edits are consistently penalized by existing metrics.
Authors: Leshem Choshen, Omri Abend (The Hebrew University of Jerusalem)