WMT 2022效率任务的Rolearflush系统

论文标题

WMT 2022效率任务的Rolearflush系统

The RoyalFlush System for the WMT 2022 Efficiency Task

论文作者

Qin, Bo, Jia, Aixin, Wang, Qiang, Lu, Jianning, Pan, Shuqin, Wang, Haibo, Chen, Ming

论文摘要

本文介绍了WMT 2022翻译效率任务的Royalflush神经机器翻译系统的提交。与常用的自回旋翻译系统不同，我们采用了一个称为Hybrid Recression Translation（HRT）的两阶段翻译范式，以结合自回归和非自动回调翻译的优势。具体而言，HRT首先自动取决于不连续的序列（例如，每$ k $令牌，$ k> 1 $做出预测），然后以非自动性方式填充所有先前的先前跳过代币。因此，我们可以通过调整$ k $来轻松地交易翻译质量和速度。此外，通过整合其他建模技术（例如，序列级知识蒸馏和深度编码器 - s-shallow-decoder层分配策略）和大量的工程工作，HRT可以提高80 \％的推理速度，并在相同的方面具有相同的能力。我们最快的系统在GPU延迟设置上达到6K+单词/秒，估计比去年的获胜者快3.1倍。

This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation (HRT) to combine the advantages of autoregressive and non-autoregressive translation. Specifically, HRT first autoregressively generates a discontinuous sequence (e.g., make a prediction every $k$ tokens, $k>1$) and then fills in all previously skipped tokens at once in a non-autoregressive manner. Thus, we can easily trade off the translation quality and speed by adjusting $k$. In addition, by integrating other modeling techniques (e.g., sequence-level knowledge distillation and deep-encoder-shallow-decoder layer allocation strategy) and a mass of engineering efforts, HRT improves 80\% inference speed and achieves equivalent translation performance with the same-capacity AT counterpart. Our fastest system reaches 6k+ words/second on the GPU latency setting, estimated to be about 3.1x faster than the last year's winner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题