论文标题

ISOVEC:控制单词嵌入空间的相对同构

IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces

论文作者

Marchisio, Kelly, Verma, Neha, Duh, Kevin, Koehn, Philipp

论文摘要

从单语嵌入空间中提取高质量翻译词典的能力取决于空间的几何相似性 - 它们的“同构”程度。我们解决了跨语性映射故障的根本原因:嵌入训练的单词导致基本空间是非同构的。我们将同构的全局度量直接纳入跳跃损耗函数,成功地增加了训练有素的单词嵌入空间的相对同构,并提高了其映射到共享的跨语义空间的能力。结果是在一般数据条件下,在域错配和训练算法差异下改善双语词典诱导。我们在https://github.com/kellymarchisio/isovec上发布ISOVEC。

The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism." We address the root-cause of faulty cross-lingual mapping: that word embedding training resulted in the underlying spaces being non-isomorphic. We incorporate global measures of isomorphism directly into the Skip-gram loss function, successfully increasing the relative isomorphism of trained word embedding spaces and improving their ability to be mapped to a shared cross-lingual space. The result is improved bilingual lexicon induction in general data conditions, under domain mismatch, and with training algorithm dissimilarities. We release IsoVec at https://github.com/kellymarchisio/isovec.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源