论文标题
多语言神经机器翻译的终身学习,并具有知识蒸馏
Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation
论文作者
论文摘要
多语言神经机器翻译(MNMT)的一个常见情况是,每个翻译任务都以顺序到达,而先前任务的训练数据不可用。在这种情况下,当前的方法遭受灾难性遗忘(CF)的影响。为了减轻CF,我们研究了基于知识蒸馏的终身学习方法。具体来说,在一个汤姆坦场景中,我们提出了一种多语言蒸馏方法,以使新模型(学生)共同从旧模型(教师)和新任务中学习多语言输出。在许多情况下,我们发现直接蒸馏面临着极端的部分蒸馏问题,我们提出了两种不同的方法来解决它:伪输入蒸馏和反向教师蒸馏。十二个翻译任务的实验结果表明,所提出的方法可以更好地巩固先前的知识并迅速研究CF。
A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.