论文标题

使用Wasserstein Barycenter的无监督的多语言对准

Unsupervised Multilingual Alignment using Wasserstein Barycenter

论文作者

Lian, Xin, Jain, Kshitij, Truszkowski, Jakub, Poupart, Pascal, Yu, Yaoliang

论文摘要

我们研究了无监督的多语言对齐,这是在不使用任何并行数据的情况下查找多种语言之间单词到单词翻译的问题。一种流行的策略是通过选择一种输入语言作为我们经过的枢轴语言,将多种语言对齐与众多简化的双语环境。但是,众所周知,通过选择不良的枢轴语言(例如英语)过渡可能会严重降低翻译质量,因为在培训过程中可能不会强制执行所有语言之间假定的及时关系。我们建议使用Wasserstein Barycenter作为一种更有信息的“平均”语言:它封装了所有语言的信息并最大程度地减少所有成对运输成本。我们在标准基准上评估我们的方法,并展示最先进的表现。

We study unsupervised multilingual alignment, the problem of finding word-to-word translations between multiple languages without using any parallel data. One popular strategy is to reduce multilingual alignment to the much simplified bilingual setting, by picking one of the input languages as the pivot language that we transit through. However, it is well-known that transiting through a poorly chosen pivot language (such as English) may severely degrade the translation quality, since the assumed transitive relations among all pairs of languages may not be enforced in the training process. Instead of going through a rather arbitrarily chosen pivot language, we propose to use the Wasserstein barycenter as a more informative "mean" language: it encapsulates information from all languages and minimizes all pairwise transportation costs. We evaluate our method on standard benchmarks and demonstrate state-of-the-art performances.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源