论文标题
终身语言知识蒸馏
Lifelong Language Knowledge Distillation
论文作者
论文摘要
在不同任务的流上执行终身语言学习(LLL)是具有挑战性的,而与多任务处理相比,没有任何性能降级。为了解决这个问题,我们提出了终身语言知识蒸馏(L2KD),这是一种简单但有效的方法,可以轻松地应用于现有的LLL架构以减轻降解。具体来说,当LLL模型接受新任务培训时,我们将教师模型分配以首先学习新任务,并通过知识蒸馏将知识传递给LLL模型。因此,LLL模型可以更好地适应新任务,同时保持先前学习的知识。实验表明,所提出的L2KD始终改善了先前的最新模型,并且与LLL任务中的多任务模型进行比较的降解可以很好地减轻序列生成和文本分类任务。
It is challenging to perform lifelong language learning (LLL) on a stream of different tasks without any performance degradation comparing to the multi-task counterparts. To address this issue, we present Lifelong Language Knowledge Distillation (L2KD), a simple but efficient method that can be easily applied to existing LLL architectures in order to mitigate the degradation. Specifically, when the LLL model is trained on a new task, we assign a teacher model to first learn the new task, and pass the knowledge to the LLL model via knowledge distillation. Therefore, the LLL model can better adapt to the new task while keeping the previously learned knowledge. Experiments show that the proposed L2KD consistently improves previous state-of-the-art models, and the degradation comparing to multi-task models in LLL tasks is well mitigated for both sequence generation and text classification tasks.