通过两阶段的对比度学习改进单词翻译

论文标题

通过两阶段的对比度学习改进单词翻译

Improving Word Translation via Two-Stage Contrastive Learning

论文作者

Li, Yaoyiran, Liu, Fangyu, Collier, Nigel, Korhonen, Anna, Vulić, Ivan

论文摘要

单词翻译或双语词典感应（BLI）是一项关键的跨语性任务，旨在弥合不同语言之间的词汇差距。在这项工作中，我们为BLI任务提供了一个强大而有效的两阶段对比度学习框架。在C1阶段，我们建议通过对比度学习目标在静态单词嵌入（WES）之间完善标准的跨语性线性图。我们还展示了如何将其集成到更精致的跨语性地图中的自学习过程中。在C2阶段，我们对Mbert进行了面向Bli的对比度微调，从而解锁了其单词翻译能力。我们还表明，从C1阶段C1的“ C2调整” Mbert补体静态WE引起的静态WE。针对不同语言和不同实验设置的标准BLI数据集进行的全面实验证明了我们的框架实现了可观的收益。尽管C1阶段的BLI方法已经在我们的比较中对所有最新的BLI方法产生了可观的收益，但完整的两阶段框架可以实现甚至更强的改进：

Word translation or bilingual lexicon induction (BLI) is a key cross-lingual task, aiming to bridge the lexical gap between different languages. In this work, we propose a robust and effective two-stage contrastive learning framework for the BLI task. At Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps. In Stage C2, we conduct BLI-oriented contrastive fine-tuning of mBERT, unlocking its word translation capability. We also show that static WEs induced from the `C2-tuned' mBERT complement static WEs from Stage C1. Comprehensive experiments on standard BLI datasets for diverse languages and different experimental setups demonstrate substantial gains achieved by our framework. While the BLI method from Stage C1 already yields substantial gains over all state-of-the-art BLI methods in our comparison, even stronger improvements are met with the full two-stage framework: e.g., we report gains for 112/112 BLI setups, spanning 28 language pairs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题