Abstract: We introduce a novel transition system for discontinuous constituency parsing. Instead of storing subtrees in a stack --i.e. a data structure with linear-time sequential access-- the proposed system uses a set of parsing items, with constant-time random access. This change makes it possible to construct any discontinuous constituency tree in exactly 4n−2 transitions for a sentence of length n. At each parsing step, the parser considers every item in the set to be combined with a focus item and to construct a new constituent in a bottom-up fashion. The parsing strategy is based on the assumption that most syntactic structures can be parsed incrementally and that the set --the memory of the parser-- remains reasonably small on average. Moreover, we introduce a provably correct dynamic oracle for the new transition system, and present the first experiments in discontinuous constituency parsing using a dynamic oracle. Our parser obtains state-of-the-art results on three English and German discontinuous treebanks.
Authors: Maximin Coavoux, Shay B. Cohen (Naver Labs Europe, University of Edinburgh)