改写器 - 评估器架构用于神经机器翻译

论文标题

改写器 - 评估器架构用于神经机器翻译

Rewriter-Evaluator Architecture for Neural Machine Translation

论文作者

Li, Yangming, Yao, Kaisheng

论文摘要

编码器数据已被广泛用于神经机器翻译（NMT）。已经提出了一些方法来通过多次解码来改进它。但是，它们的全部潜力受到缺乏适当的终止政策的限制。为了解决这个问题，我们提出了一种新颖的体系结构，重写者 - 评估者。它由重写者和评估员组成。翻译源句子涉及多个通过。在每次通过时，重写者都会产生一个新的翻译，以改善过去的翻译，评估人员估算了翻译质量，以决定是否终止重写过程。我们还建议优先考虑梯度下降（PGD），以促进培训重写者和评估者共同培训。尽管产生了多次解码的传球，但使用建议的PGD方法的重写器评估器可以接受与训练Encoder-Decoder模型相似的时间进行训练。我们将提出的体系结构应用于改进一般的NMT模型（例如变压器）。我们对两项翻译任务进行了广泛的实验，即中文英语和英国 - 德国人，并表明拟议的体系结构显着改善了NMT模型的性能，并显着胜过以前的基线。

Encoder-decoder has been widely used in neural machine translation (NMT). A few methods have been proposed to improve it with multiple passes of decoding. However, their full potential is limited by a lack of appropriate termination policies. To address this issue, we present a novel architecture, Rewriter-Evaluator. It consists of a rewriter and an evaluator. Translating a source sentence involves multiple passes. At every pass, the rewriter produces a new translation to improve the past translation and the evaluator estimates the translation quality to decide whether to terminate the rewriting process. We also propose prioritized gradient descent (PGD) that facilitates training the rewriter and the evaluator jointly. Though incurring multiple passes of decoding, Rewriter-Evaluator with the proposed PGD method can be trained with a similar time to that of training encoder-decoder models. We apply the proposed architecture to improve the general NMT models (e.g., Transformer). We conduct extensive experiments on two translation tasks, Chinese-English and English-German, and show that the proposed architecture notably improves the performances of NMT models and significantly outperforms previous baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题