Abstract: Authorship verification is the problem of inferring whether two texts were written by the same author. For this task, unmasking is one of the most robust approaches as of today with the major shortcoming of only being applicable to book-length texts. In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. Our generalized approach therefore reduces the required material by orders of magnitude, making unmasking applicable to authorship cases of more practical proportions. The new approach is on par with other state-of-the-art techniques that are optimized for texts of this length: it achieves accuracies of 75–80%, while also allowing for easy adjustment to forensic scenarios that require higher levels of confidence in the classification.
Authors: Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast (Bauhaus-Universitat Weimar, Leipzig University)