改善具有更好相对位置嵌入的变压器模型

论文标题

改善具有更好相对位置嵌入的变压器模型

Improve Transformer Models with Better Relative Position Embeddings

论文作者

Huang, Zhiheng, Liang, Davis, Xu, Peng, Xiang, Bing

论文摘要

变压器体系结构依赖于明确的位置编码，以保留单词顺序的概念。在本文中，我们认为现有工作并未完全利用位置信息。例如，正弦嵌入的最初建议是固定的，无法学习。在本文中，我们首先回顾了相对位置嵌入的绝对位置嵌入和现有方法。然后，我们提出了新技术，鼓励在自我发项机制中嵌入查询，密钥和相对位置之间的相互作用增加。我们最有希望的方法是对绝对位置嵌入的概括，与以前的位置嵌入方法相比，在Squead1.1上的结果改善了。此外，我们解决了嵌入位置是否足够强大以处理长序列的电感特性。我们从经验上证明，从归纳的角度来看，我们的相对位置嵌入方法是合理概括和鲁棒的。最后，我们表明我们提出的方法可以作为几乎置换式替代品，用于以较小的计算预算提高大型模型的准确性。

Transformer architectures rely on explicit position encodings in order to preserve a notion of word order. In this paper, we argue that existing work does not fully utilize position information. For example, the initial proposal of a sinusoid embedding is fixed and not learnable. In this paper, we first review absolute position embeddings and existing methods for relative position embeddings. We then propose new techniques that encourage increased interaction between query, key and relative position embeddings in the self-attention mechanism. Our most promising approach is a generalization of the absolute position embedding, improving results on SQuAD1.1 compared to previous position embeddings approaches. In addition, we address the inductive property of whether a position embedding can be robust enough to handle long sequences. We demonstrate empirically that our relative position embedding method is reasonably generalized and robust from the inductive perspective. Finally, we show that our proposed method can be adopted as a near drop-in replacement for improving the accuracy of large models with a small computational budget.

下载PDF全文

下载文献需遵守相关版权规定

论文标题