论文标题
关于多语言模型的负面干扰:发现和元学习治疗
On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment
论文作者
论文摘要
现代的多语言模型接受了来自多种语言的串联文本培训,以期为每种语言带来好处,并具有最明显的好处,这是对低资源语言的收益。但是,最近的工作表明,这种方法可以降低高农产品语言的性能,这种现象被称为负干扰。在本文中,我们介绍了负干扰的首次系统研究。我们表明,与以前的信念相反,负面干扰也会影响低资源语言。尽管参数是最大值的,以学习语言 - 通用结构,但我们证明了多语言模型中确实存在语言特定参数,它们是负面干扰的潜在原因。在这些观察结果的推动下,我们还提出了一种元学习算法,该算法可以通过将特定于语言的层作为元参数添加,从而获得更好的跨语性转移性并减轻负面干扰,并以明确改善共享图层对所有语言的普遍化的方式进行培训。总体而言,我们的结果表明,负面干扰比以前已知的更常见,这表明了改善多语言表示的新方向。
Modern multilingual models are trained on concatenated text from multiple languages in hopes of conferring benefits to each (positive transfer), with the most pronounced benefits accruing to low-resource languages. However, recent work has shown that this approach can degrade performance on high-resource languages, a phenomenon known as negative interference. In this paper, we present the first systematic study of negative interference. We show that, contrary to previous belief, negative interference also impacts low-resource languages. While parameters are maximally shared to learn language-universal structures, we demonstrate that language-specific parameters do exist in multilingual models and they are a potential cause of negative interference. Motivated by these observations, we also present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference, by adding language-specific layers as meta-parameters and training them in a manner that explicitly improves shared layers' generalization on all languages. Overall, our results show that negative interference is more common than previously known, suggesting new directions for improving multilingual representations.