论文标题
从模仿到整合:预训练语言模型的知识集成
From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
论文作者
论文摘要
调查重复使用已发布的预训练语言模型(PLM)的更好方法可以显着降低计算成本和潜在的环境副作用。本文探讨了一种新颖的PLM重用范式,知识整合(KI)。如果没有可用的人类注释,KI的目标是将来自不同教师PLM的知识融合在一起,每个PLM都专门研究不同的分类问题,将其与多功能的学生模型合并。为了实现这一目标,我们首先得出了虚拟黄金监督与教师预测之间的相关性。然后,我们设计了模型不确定性 - 意识到知识整合(MUKI)框架,以恢复学生的黄金监督。具体而言,Muki采用蒙特卡洛辍学,以估算监督整合的模型不确定性。进一步纳入了基于实例的重新加权机制,以基于不确定性得分的余地进行纳入,以应对教师的潜在冲突监督。实验结果表明,MUKI对基准数据集的基准实现了实质性改进。进一步的分析表明,Muki可以很好地概括与教师模型与异构体系结构合并,甚至是跨语言数据集的教师。
Investigating better ways to reuse the released pre-trained language models (PLMs) can significantly reduce the computational cost and the potential environmental side-effects. This paper explores a novel PLM reuse paradigm, Knowledge Integration (KI). Without human annotations available, KI aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model. To achieve this, we first derive the correlation between virtual golden supervision and teacher predictions. We then design a Model Uncertainty--aware Knowledge Integration (MUKI) framework to recover the golden supervision for the student. Specifically, MUKI adopts Monte-Carlo Dropout to estimate model uncertainty for the supervision integration. An instance-wise re-weighting mechanism based on the margin of uncertainty scores is further incorporated, to deal with the potential conflicting supervision from teachers. Experimental results demonstrate that MUKI achieves substantial improvements over baselines on benchmark datasets. Further analysis shows that MUKI can generalize well for merging teacher models with heterogeneous architectures, and even teachers major in cross-lingual datasets.