基于跨度的不连续的选区解析：一个基于图表的精确算法的家族，其时间复杂性从O（n^6）降至O（n^3）

论文标题

基于跨度的不连续的选区解析：一个基于图表的精确算法的家族，其时间复杂性从O（n^6）降至O（n^3）

Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3)

论文作者

Corro, Caio

论文摘要

我们介绍了一种基于图表的新型算法，用于基于跨度的二级分数树的基于跨度的解析，包括不良结构。特别是，我们表明我们可以在较小的搜索空间和时间复杂性的情况下构建解析器的变体，范围从$ \ MATHCAL O（N^6）$降至$ \ Mathcal O（N^3）$。立方时变体覆盖了在语言库中观察到的98％的成分，同时具有与连续的选区分析器相同的复杂性。我们评估了关于德语和英国树岸（NEGRA，Tiger和不连续的PTB）的方法，并报告最先进的结果会导致完全监督的环境。我们还尝试了基于训练的单词嵌入和基于\ bert {}的神经网络。

We introduce a novel chart-based algorithm for span-based parsing of discontinuous constituency trees of block degree two, including ill-nested structures. In particular, we show that we can build variants of our parser with smaller search spaces and time complexities ranging from $\mathcal O(n^6)$ down to $\mathcal O(n^3)$. The cubic time variant covers 98\% of constituents observed in linguistic treebanks while having the same complexity as continuous constituency parsers. We evaluate our approach on German and English treebanks (Negra, Tiger and Discontinuous PTB) and report state-of-the-art results in the fully supervised setting. We also experiment with pre-trained word embeddings and \bert{}-based neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题