通过光泽正规化预训练改善上下文表示

论文标题

通过光泽正规化预训练改善上下文表示

Improving Contextual Representation with Gloss Regularized Pre-training

论文作者

Lin, Yu, An, Zhecheng, Wu, Peihao, Ma, Zejun

论文摘要

尽管在许多NLP任务上取得了令人印象深刻的结果，但BERT样掩盖的语言模型（MLM）遇到了预训练和推理之间的差异。鉴于这一差距，我们从单词概率分布的角度研究了预训练和推断的上下文表示。我们发现，伯特（Bert）的风险会忽略预训练中的上下文单词相似性。为了解决这个问题，我们提出了一个辅助光泽正规器模块以bert预训练（gr-bert），以增强单词语义相似性。通过预测掩盖的单词并将上下文嵌入与相应的光泽度对齐，可以明确建模单词相似性。我们为GR-BERT设计了两个体系结构，并在下游任务中评估了我们的模型。实验结果表明，光泽正规器在单词级别和句子级语义表示中受益于BERT。 Gr-Bert在词汇替代任务中实现了最新的最先进，并在无监督和监督的STS任务中大大促进了BERT句子代表。

Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) encounter the discrepancy between pre-training and inference. In light of this gap, we investigate the contextual representation of pre-training and inference from the perspective of word probability distribution. We discover that BERT risks neglecting the contextual word similarity in pre-training. To tackle this issue, we propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT), to enhance word semantic similarity. By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled. We design two architectures for GR-BERT and evaluate our model in downstream tasks. Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation. The GR-BERT achieves new state-of-the-art in lexical substitution task and greatly promotes BERT sentence representation in both unsupervised and supervised STS tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题