将OCCAM的剃刀应用于基于变压器的依赖性解析：什么有效，什么无效，什么是真正必要的

论文标题

将OCCAM的剃刀应用于基于变压器的依赖性解析：什么有效，什么无效，什么是真正必要的

Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary

论文作者

Grünewald, Stefan, Friedrich, Annemarie, Kuhn, Jonas

论文摘要

引入基于预训练的变压器的上下文化词嵌入，导致了基于图形依赖（UD）等框架的基于图的解析器的准确性的显着提高。但是，以前的作品在各个维度上有所不同，包括选择预训练的语言模型以及它们是否使用LSTM层。为了解开这些选择的效果并确定一个简单但广泛适用的体系结构的目的，我们引入了Steass，这是一种新的基于图形的基于图形的依赖关系解析器。使用步骤，我们对各种语言的UD语料库进行了一系列分析。我们发现，在迄今为止，预训练的嵌入是对解析器性能的最大影响，并将XLM-R确定为我们研究中各种语言的强大选择。使用基于变压器的嵌入时，添加LSTM层没有任何好处。多任务训练设置输出其他UD功能可能会扭曲结果。将这些见解共同完成，我们提出了一种简单但广泛的解析器体系结构和配置，在12种不同的语言中，有10种实现了新的最新结果（就LAS而言）。

The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A multi-task training setup outputting additional UD features may contort results. Taking these insights together, we propose a simple but widely applicable parser architecture and configuration, achieving new state-of-the-art results (in terms of LAS) for 10 out of 12 diverse languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题