论文标题

利用语言图像预处理以进行有效且稳健的双语单词对齐方式

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

论文作者

Dinh, Tuan, Sohn, Jy-yong, Rajput, Shashank, Ossowski, Timothy, Ming, Yifei, Hu, Junjie, Papailiopoulos, Dimitris, Lee, Kangwook

论文摘要

无平行语料库的单词翻译变得可行,与监督方法的性能匹配。最近的发现表明,可以通过使用视觉观测值来改善无监督单词翻译(UWT)的准确性和鲁棒性,这些观察值是跨语言的通用表示。在这项工作中,我们研究了不仅使用视觉观察,而且还要使用验证的语言图像模型来实现更有效,更强大的UWT。具体而言,我们使用语言图像预处理(WALIP)开发了一种新颖的UWT方法,称为单词对齐方式,该方法通过剪辑模型提供的图像和文本的共享嵌入空间来利用视觉观察(Radford等,2021)。沃利普有一个两步的程序。首先,我们使用我们提出的基于图像的指纹计算出具有相似性的高个性的单词对,该指纹定义了单词对齐单词的初始枢轴。其次,我们应用强大的procrustes算法来估计两个嵌入空间之间的线性映射,从而迭代地校正并完善了估计的比对。我们广泛的实验表明,Walip改善了双语单词对齐方式的最先进的表现,以跨不同单词嵌入的几个语言对,并显示出对语言对的差异或对两个单词嵌入的培训语言的差异。

Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations but also pretrained language-image models for enabling a more efficient and robust UWT. Specifically, we develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP), which leverages visual observations via the shared embedding space of images and texts provided by CLIP models (Radford et al., 2021). WALIP has a two-step procedure. First, we retrieve word pairs with high confidences of similarity, computed using our proposed image-based fingerprints, which define the initial pivot for the word alignment. Second, we apply our robust Procrustes algorithm to estimate the linear mapping between two embedding spaces, which iteratively corrects and refines the estimated alignment. Our extensive experiments show that WALIP improves upon the state-of-the-art performance of bilingual word alignment for a few language pairs across different word embeddings and displays great robustness to the dissimilarity of language pairs or training corpora for two word embeddings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源