论文标题
昏暗:迭代迭代非自动回归变压器的多个步骤,以进行机器翻译
DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers for Machine Translation
论文作者
论文摘要
随着解码步骤的数量的增加,迭代非自动回形变压器的计算益处降低。作为一种补救措施,我们引入了多个步骤(DIM),这是一种简单而有效的蒸馏技术,以减少达到一定翻译质量所需步骤的数量。蒸馏模型享有早期迭代的计算益处,同时保留了几个迭代步骤的增强功能。 DIMS依靠两种模型,即学生和老师。在多个解码步骤之后,在老师通过缓慢移动的平均值跟随学生的同时,对学生进行了优化,以预测老师的输出。移动平均线使老师的知识不断更新,并提高了老师提供的标签质量。在推论过程中,学生用于翻译,并且不添加其他计算。我们验证了DIMS对在WMT'14 DE-EN的蒸馏和原始版本上单步翻译精度改进的各种模型上的有效性。
The computational benefits of iterative non-autoregressive transformers decrease as the number of decoding steps increases. As a remedy, we introduce Distill Multiple Steps (DiMS), a simple yet effective distillation technique to decrease the number of required steps to reach a certain translation quality. The distilled model enjoys the computational benefits of early iterations while preserving the enhancements from several iterative steps. DiMS relies on two models namely student and teacher. The student is optimized to predict the output of the teacher after multiple decoding steps while the teacher follows the student via a slow-moving average. The moving average keeps the teacher's knowledge updated and enhances the quality of the labels provided by the teacher. During inference, the student is used for translation and no additional computation is added. We verify the effectiveness of DiMS on various models obtaining 7.8 and 12.9 BLEU points improvements in single-step translation accuracy on distilled and raw versions of WMT'14 De-En.