用于调整预训练的单语和多语言模型的食谱

论文标题

用于调整预训练的单语和多语言模型的食谱

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

论文作者

Stickland, Asa Cooper, Li, Xian, Ghazvininejad, Marjan

论文摘要

最近在单语数据和机器翻译（MT）进行微调的预培训方面取得了成功，但尚不清楚如何最好地利用预先培训的模型来完成给定的MT任务。本文在微调MT上的预训练模型时研究了冻结参数的好处和缺点。我们专注于1）微调仅在英语单语言数据的BART上训练的模型。 2）微调一个模型，该模型对来自25种语言的单语言数据进行了培训，Mbart。对于Bart，我们通过冻结大多数模型参数并添加额外的位置嵌入来获得最佳性能。对于Mbart，我们将大多数语言对的幼稚微调的性能与编码器以及大多数解码器的冷冻匹配。编码器的注意参数对于微调最重要。当将自己限制为越南人对英语的室外训练套装时，我们看到了基线的最大进步。

There has been recent success in pre-training on monolingual data and fine-tuning on Machine Translation (MT), but it remains unclear how to best leverage a pre-trained model for a given MT task. This paper investigates the benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on MT. We focus on 1) Fine-tuning a model trained only on English monolingual data, BART. 2) Fine-tuning a model trained on monolingual data from 25 languages, mBART. For BART we get the best performance by freezing most of the model parameters, and adding extra positional embeddings. For mBART we match or outperform the performance of naive fine-tuning for most language pairs with the encoder, and most of the decoder, frozen. The encoder-decoder attention parameters are most important to fine-tune. When constraining ourselves to an out-of-domain training set for Vietnamese to English we see the largest improvements over the fine-tuning baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题