论文标题
Infoxlm:跨语言模型预训练的信息理论框架
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
论文作者
论文摘要
在这项工作中,我们提出了一个信息理论框架,该框架将跨语性语言模型预训练作为最大程度地提高多语言 - 跨性别文本之间的相互信息。统一的观点有助于我们更好地了解学习跨语性表示的现有方法。更重要的是,受框架的启发,我们提出了一项基于对比度学习的新训练任务。具体而言,我们将双语句子对视为相同含义的两个观点,并鼓励其编码的表示比负面示例更相似。通过利用单语言和平行语料库,我们共同训练借口任务,以提高预训练模型的跨语性可传递性。几个基准的实验结果表明,我们的方法的性能要好得多。代码和预培训模型可在https://aka.ms/infoxlm上找到。
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https://aka.ms/infoxlm.