论文标题

沿着单词感官的长尾巴上移动,用掩饰的生物编码器放弃歧义

Moving Down the Long Tail of Word Sense Disambiguation with Gloss-Informed Biencoders

论文作者

Blevins, Terra, Zettlemoyer, Luke

论文摘要

单词意义上的一个主要障碍(WSD)是单词感官并不统一分布,导致现有模型通常在训练过程中罕见或看不见的感官上表现不佳。我们提出了一个双重编码模型,该模型独立嵌入(1)目标词及其周围的上下文以及(2)每种意义的字典定义或光泽。编码器在相同的表示空间中共同优化,因此可以通过找到每个目标词嵌入的最接近的嵌入方式来执行Sense Dismamuation。我们的系统的表现优于以前在英语全词WSD上的最先进模型。这些收益主要来自在罕见感官方面的提高性能,从而导致误差降低31.1%,而对先前工作的频率较低。这表明,通过对其定义进行建模,可以更有效地消除稀有感官。

A major obstacle in Word Sense Disambiguation (WSD) is that word senses are not uniformly distributed, causing existing models to generally perform poorly on senses that are either rare or unseen during training. We propose a bi-encoder model that independently embeds (1) the target word with its surrounding context and (2) the dictionary definition, or gloss, of each sense. The encoders are jointly optimized in the same representation space, so that sense disambiguation can be performed by finding the nearest sense embedding for each target word embedding. Our system outperforms previous state-of-the-art models on English all-words WSD; these gains predominantly come from improved performance on rare senses, leading to a 31.1% error reduction on less frequent senses over prior work. This demonstrates that rare senses can be more effectively disambiguated by modeling their definitions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源