在上下文中迷失了？关于上下文化词嵌入的感官差异

论文标题

在上下文中迷失了？关于上下文化词嵌入的感官差异

Lost in Context? On the Sense-wise Variance of Contextualized Word Embeddings

论文作者

Wang, Yile, Zhang, Yue

论文摘要

语言模型中的上下文化词嵌入给了NLP。直觉上，句子信息被整合到单词的表示中，这可以帮助模型多义。但是，上下文灵敏度也导致表示的差异，这可能会破坏同义词的语义一致性。我们量化了典型的预训练模型中每个单词sense的上下文嵌入的程度各不相同。结果表明，上下文化的嵌入可以在上下文中高度一致。此外，言论的一部分，单词感官的数量和句子长度对感官表示的差异有影响。有趣的是，我们发现单词表示是偏见的，在不同上下文中的第一个单词往往更相似。我们分析了这种现象，还提出了一种简单的方法，以减轻基于距离的单词意义上的歧义设置中的这种偏见。

Contextualized word embeddings in language models have given much advance to NLP. Intuitively, sentential information is integrated into the representation of words, which can help model polysemy. However, context sensitivity also leads to the variance of representations, which may break the semantic consistency for synonyms. We quantify how much the contextualized embeddings of each word sense vary across contexts in typical pre-trained models. Results show that contextualized embeddings can be highly consistent across contexts. In addition, part-of-speech, number of word senses, and sentence length have an influence on the variance of sense representations. Interestingly, we find that word representations are position-biased, where the first words in different contexts tend to be more similar. We analyze such a phenomenon and also propose a simple way to alleviate such bias in distance-based word sense disambiguation settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题