通过自我监督的注意来改善伯特

论文标题

通过自我监督的注意来改善伯特

Improving BERT with Self-Supervised Attention

论文作者

Chen, Yiren, Kou, Xiaoyu, Bai, Jiangang, Tong, Yunhai

论文摘要

应用大型预训练的NLP模型（例如BERT）最受欢迎的范例之一是在较小的数据集中微调它。但是，一个挑战仍然是一个挑战，因为微调模型经常在较小的数据集上过度拟合。这种现象的一种症状是，句子中无关紧要或误导性的单词，对于人类来说易于理解，可以实质上降低这些易于的BERT模型的性能。在本文中，我们提出了一种新颖的技术，称为自我监督注意力（SSA），以帮助促进这种概括挑战。具体而言，SSA会自动通过探测上一个迭代的微调模型来迭代产生弱的令牌级别的注意标签。我们研究了将SSA整合到BERT中的两种不同的方式，并提出了一种混合方法来结合其利益。从经验上讲，通过各种公共数据集，我们使用SSA增强的BERT模型说明了大幅改进性能。

One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant or misleading words in the sentence, which are easy to understand for human beings, can substantially degrade the performance of these finetuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by probing the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, through a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题