论文标题
通过光泽正规化预训练改善上下文表示
Improving Contextual Representation with Gloss Regularized Pre-training
论文作者
论文摘要
尽管在许多NLP任务上取得了令人印象深刻的结果,但BERT样掩盖的语言模型(MLM)遇到了预训练和推理之间的差异。鉴于这一差距,我们从单词概率分布的角度研究了预训练和推断的上下文表示。我们发现,伯特(Bert)的风险会忽略预训练中的上下文单词相似性。为了解决这个问题,我们提出了一个辅助光泽正规器模块以bert预训练(gr-bert),以增强单词语义相似性。通过预测掩盖的单词并将上下文嵌入与相应的光泽度对齐,可以明确建模单词相似性。我们为GR-BERT设计了两个体系结构,并在下游任务中评估了我们的模型。实验结果表明,光泽正规器在单词级别和句子级语义表示中受益于BERT。 Gr-Bert在词汇替代任务中实现了最新的最先进,并在无监督和监督的STS任务中大大促进了BERT句子代表。
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) encounter the discrepancy between pre-training and inference. In light of this gap, we investigate the contextual representation of pre-training and inference from the perspective of word probability distribution. We discover that BERT risks neglecting the contextual word similarity in pre-training. To tackle this issue, we propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT), to enhance word semantic similarity. By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled. We design two architectures for GR-BERT and evaluate our model in downstream tasks. Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation. The GR-BERT achieves new state-of-the-art in lexical substitution task and greatly promotes BERT sentence representation in both unsupervised and supervised STS tasks.