论文标题
与线性关注的建模上下文,以进行可扩展的文档级翻译
Modeling Context With Linear Attention for Scalable Document-Level Translation
论文作者
论文摘要
文档级别的机器翻译利用句子间依赖性产生更连贯和一致的翻译。但是,这些模型主要基于变压器,很难扩展到长文档,因为它们的注意力层在序列长度上具有二次复杂性。最近在有效关注的努力提高了可扩展性,但它们对文档翻译的影响仍未得到探索。在这项工作中,我们研究了Peng等人最近的线性注意模型的功效。 (2021)关于文档翻译并用句子范围的门扩大了它,以促进新近度诱导偏见。我们在IWSLT 2015和OpenSubtitles 2018上对Transformer评估了该模型,这表明具有相似或更好的BLEU得分的长序列上的解码速度大大提高。我们表明,句子门控进一步提高了IWSLT的翻译质量。
Document-level machine translation leverages inter-sentence dependencies to produce more coherent and consistent translations. However, these models, predominantly based on transformers, are difficult to scale to long documents as their attention layers have quadratic complexity in the sequence length. Recent efforts on efficient attention improve scalability, but their effect on document translation remains unexplored. In this work, we investigate the efficacy of a recent linear attention model by Peng et al. (2021) on document translation and augment it with a sentential gate to promote a recency inductive bias. We evaluate the model on IWSLT 2015 and OpenSubtitles 2018 against the transformer, demonstrating substantially increased decoding speed on long sequences with similar or better BLEU scores. We show that sentential gating further improves translation quality on IWSLT.