p-transformer：迈向更好的文档对文档的神经机器翻译

论文标题

p-transformer：迈向更好的文档对文档的神经机器翻译

P-Transformer: Towards Better Document-to-Document Neural Machine Translation

论文作者

Li, Yachao, Li, Junhui, Jiang, Jing, Tao, Shimin, Yang, Hao, Zhang, Min

论文摘要

直接通过从头开始训练文档对文档（DOC2DOC）神经机器翻译（NMT），尤其是在小型数据集上通常无法收敛。我们专用的探测任务表明，1）绝对位置和相对位置信息在达到上层编码器层后逐渐减弱甚至消失，以及2）在编码器输出中消失的绝对位置信息消失会导致doc2doc nmt的训练失败。为了减轻这个问题，我们提出了一个感知的变压器（P-Transformer），以增强自我注意和交叉注意的绝对位置信息。具体而言，我们通过简单而有效的加法操作将绝对位置信息（即位置嵌入）整合到自我注意和交叉注意的查询对中。此外，我们还将相对位置编码整合在自我注意力中。提出的P转换器使用正弦位置编码，并且不需要任何任务指定的位置嵌入，段嵌入或注意机制。通过上述方法，我们使用P-Transformer构建了DOC2DOC NMT模型，该模型摄入源文档并以序列到序列（SEQ2SEQ）方式完全生成目标文档。此外，P型转换器可以应用于基于SEQ2SEQ的文档到句子（doc2sent）和句子到句子（已发送2sent）翻译。 DOC2DOC NMT的广泛实验结果表明，P转换器在7个语言对中广泛使用的9个文档级数据集上的强大基准，涵盖了小型，中层和大型，并实现了新的先进技术。关于话语现象的实验表明，我们的DOC2DOC NMT模型可以提高BLEU和话语连贯性的翻译质量。我们在Github上提供代码。

Directly training a document-to-document (Doc2Doc) neural machine translation (NMT) via Transformer from scratch, especially on small datasets usually fails to converge. Our dedicated probing tasks show that 1) both the absolute position and relative position information gets gradually weakened or even vanished once it reaches the upper encoder layers, and 2) the vanishing of absolute position information in encoder output causes the training failure of Doc2Doc NMT. To alleviate this problem, we propose a position-aware Transformer (P-Transformer) to enhance both the absolute and relative position information in both self-attention and cross-attention. Specifically, we integrate absolute positional information, i.e., position embeddings, into the query-key pairs both in self-attention and cross-attention through a simple yet effective addition operation. Moreover, we also integrate relative position encoding in self-attention. The proposed P-Transformer utilizes sinusoidal position encoding and does not require any task-specified position embedding, segment embedding, or attention mechanism. Through the above methods, we build a Doc2Doc NMT model with P-Transformer, which ingests the source document and completely generates the target document in a sequence-to-sequence (seq2seq) way. In addition, P-Transformer can be applied to seq2seq-based document-to-sentence (Doc2Sent) and sentence-to-sentence (Sent2Sent) translation. Extensive experimental results of Doc2Doc NMT show that P-Transformer significantly outperforms strong baselines on widely-used 9 document-level datasets in 7 language pairs, covering small-, middle-, and large-scales, and achieves a new state-of-the-art. Experimentation on discourse phenomena shows that our Doc2Doc NMT models improve the translation quality in both BLEU and discourse coherence. We make our code available on Github.

下载PDF全文

下载文献需遵守相关版权规定

论文标题