论文标题
改进低资源语言的多语言神经机器翻译:法语,英语 - 越南语
Improving Multilingual Neural Machine Translation For Low-Resource Languages: French,English - Vietnamese
论文作者
论文摘要
先前的工作表明,低资源语言对可以受益于多语言机器翻译(MT)系统,该系统依赖于许多语言对的联合培训。本文提出了两种简单的策略,以解决两种低资源语言对的多语言MT系统中的罕见单词问题:法国 - 越南和英语 - 越南语。第一个策略是关于代币在源语言中共享空间中的动态学习单词相似性,而另一种试图通过在培训期间更新其嵌入来增强稀有单词的翻译能力。此外,我们利用多语言MT系统的单语言数据来增加合成平行语料库的数量,同时处理数据稀疏问题。我们已经显示出高达1.62和+2.54 BLEU的显着改善,这是两种语言对的双语基线系统,并为研究社区发布了我们的数据集。
Prior works have demonstrated that a low-resource language pair can benefit from multilingual machine translation (MT) systems, which rely on many language pairs' joint training. This paper proposes two simple strategies to address the rare word issue in multilingual MT systems for two low-resource language pairs: French-Vietnamese and English-Vietnamese. The first strategy is about dynamical learning word similarity of tokens in the shared space among source languages while another one attempts to augment the translation ability of rare words through updating their embeddings during the training. Besides, we leverage monolingual data for multilingual MT systems to increase the amount of synthetic parallel corpora while dealing with the data sparsity problem. We have shown significant improvements of up to +1.62 and +2.54 BLEU points over the bilingual baseline systems for both language pairs and released our datasets for the research community.