论文标题

学习通过任务不足的面膜培训在BERT转移中赢得彩票

Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

论文作者

Liu, Yuanxin, Meng, Fandong, Lin, Zheng, Fu, Peng, Cao, Yanan, Wang, Weiping, Zhou, Jie

论文摘要

关于彩票票证假设(LTH)的最新研究表明,像BERT这样的预训练的语言模型(PLM)包含具有与原始PLM相似的转移学习性能的匹配子网。使用基于大小的修剪发现这些子网。在本文中,我们发现BERT子网具有比这些研究所表明的更多的潜力。首先,我们发现修剪的幅度修剪的成功归因于保留的预训练性能,这与下游可传递性相关。受此启发的启发,我们建议直接优化针对训练预训练目标的子网结构,这可以更好地保留培训前的性能。具体而言,我们在预训练任务上对模型权重训练二进制蒙版,目的是保留子网的通用可传递性,这对任何特定的下游任务都不可知。然后,我们在胶水基准和小队数据集上微调子网。结果表明,与幅度修剪相比,面膜训练可以有效地找到BERT子网,并且在下游任务上的总体表现提高了。此外,我们的方法在搜索子网络方面的效率也更高,并且在一定范围的数据稀缺范围内进行微调时更有利。我们的代码可在https://github.com/llyx97/tamt上找到。

Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源