关于多语言模型的负面干扰：发现和元学习治疗

论文标题

关于多语言模型的负面干扰：发现和元学习治疗

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

论文作者

Wang, Zirui, Lipton, Zachary C., Tsvetkov, Yulia

论文摘要

现代的多语言模型接受了来自多种语言的串联文本培训，以期为每种语言带来好处，并具有最明显的好处，这是对低资源语言的收益。但是，最近的工作表明，这种方法可以降低高农产品语言的性能，这种现象被称为负干扰。在本文中，我们介绍了负干扰的首次系统研究。我们表明，与以前的信念相反，负面干扰也会影响低资源语言。尽管参数是最大值的，以学习语言 - 通用结构，但我们证明了多语言模型中确实存在语言特定参数，它们是负面干扰的潜在原因。在这些观察结果的推动下，我们还提出了一种元学习算法，该算法可以通过将特定于语言的层作为元参数添加，从而获得更好的跨语性转移性并减轻负面干扰，并以明确改善共享图层对所有语言的普遍化的方式进行培训。总体而言，我们的结果表明，负面干扰比以前已知的更常见，这表明了改善多语言表示的新方向。

Modern multilingual models are trained on concatenated text from multiple languages in hopes of conferring benefits to each (positive transfer), with the most pronounced benefits accruing to low-resource languages. However, recent work has shown that this approach can degrade performance on high-resource languages, a phenomenon known as negative interference. In this paper, we present the first systematic study of negative interference. We show that, contrary to previous belief, negative interference also impacts low-resource languages. While parameters are maximally shared to learn language-universal structures, we demonstrate that language-specific parameters do exist in multilingual models and they are a potential cause of negative interference. Motivated by these observations, we also present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference, by adding language-specific layers as meta-parameters and training them in a manner that explicitly improves shared layers' generalization on all languages. Overall, our results show that negative interference is more common than previously known, suggesting new directions for improving multilingual representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题