论文标题
SMSMIX:Word Sense Dismampuation的感官维护句子的混音
SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation
论文作者
论文摘要
Word Sense Disampuation(WSD)是一项NLP任务,旨在根据离散意义选择确定句子中单词的正确意义。尽管当前的系统已经完成了此类任务的前所未有的表演,但是训练过程中单词感官的不均匀分布通常会导致系统在罕见的感觉上的性能较差。为此,我们考虑增加数据增加,以增加这些最不常见的感觉(LFS)的频率,以减少训练过程中感官的分布偏差。我们提出了维持感官的句子混音(SMSMIX),这是一种新颖的单词级混音方法,可保持目标词的感觉。 SMSMIX使用掩码预测顺利地将两个句子融合在一起,同时保留由显着性得分确定的相关跨度以保持特定单词的意义。据我们所知,这是在保留特定词的含义的同时,将混合混音应用于NLP中的首次尝试。通过广泛的实验,我们验证了我们的增强方法可以有效地提供有关在维持目标感觉标签训练期间罕见感官的更多信息。
Word Sense Disambiguation (WSD) is an NLP task aimed at determining the correct sense of a word in a sentence from discrete sense choices. Although current systems have attained unprecedented performances for such tasks, the nonuniform distribution of word senses during training generally results in systems performing poorly on rare senses. To this end, we consider data augmentation to increase the frequency of these least frequent senses (LFS) to reduce the distributional bias of senses during training. We propose Sense-Maintained Sentence Mixup (SMSMix), a novel word-level mixup method that maintains the sense of a target word. SMSMix smoothly blends two sentences using mask prediction while preserving the relevant span determined by saliency scores to maintain a specific word's sense. To the best of our knowledge, this is the first attempt to apply mixup in NLP while preserving the meaning of a specific word. With extensive experiments, we validate that our augmentation method can effectively give more information about rare senses during training with maintained target sense label.