I2D2：具有神经和自我图像的归纳知识蒸馏

论文标题

I2D2：具有神经和自我图像的归纳知识蒸馏

I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

论文作者

Bhagavatula, Chandra, Hwang, Jena D., Downey, Doug, Bras, Ronan Le, Lu, Ximing, Qin, Lianhui, Sakaguchi, Keisuke, Swayamdipta, Swabha, West, Peter, Choi, Yejin

论文摘要

预先训练的语言模型的常识能力随着规模而显着改善，导致许多人相信规模是唯一的获胜食谱。但是是吗？在这里，我们调查了一种先验的替代方法：如果用新的常识蒸馏算法供电，那么较小的语言模型（例如GPT-2）是否可以赢得更大和更好的模型（例如GPT-3）？关键的智力挑战是设计一种学习算法，该算法在不依赖规模的好处而实现具有竞争力的常识性收购水平。特别是，我们研究常识性知识的生成模型，重点是生成通用的任务，关于日常概念的常识性事实的陈述，例如，鸟类可以飞行。我们介绍了I2D2，这是一个新颖的常识蒸馏框架，它宽松地遵循了West等人的象征性知识蒸馏。但是，通过两项创新打破了对极端尺度教师模型的依赖：（1）神经系统解码的新颖适应性，以提高弱弱的，现成的语言模型的产生质量，以及（2）自我记录学习以从模型自身的增强的常识性收购功能中学习。经验结果表明，规模不是唯一的方法，因为新型算法可以是有前途的选择。此外，我们的研究导致了新的仿制药，即Gen-A-Tomic，这是迄今为止可用的最大，最高质量。

Commonsense capabilities of pre-trained language models dramatically improve with scale, leading many to believe that scale is the only winning recipe. But is it? Here, we investigate an alternative that a priori seems impossible: can smaller language models (e.g., GPT-2) win over models that are orders of magnitude larger and better (e.g., GPT-3), if powered with novel commonsense distillation algorithms? The key intellectual challenge is to design a learning algorithm that achieve a competitive level of commonsense acquisition, without relying on the benefits of scale. In particular, we study generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West et al. but breaks the dependence on the extreme-scale teacher model with two innovations: (1) the novel adaptation of NeuroLogic Decoding to enhance the generation quality of the weak, off-the-shelf language models, and (2) self-imitation learning to iteratively learn from the model's own enhanced commonsense acquisition capabilities. Empirical results suggest that scale is not the only way, as novel algorithms can be a promising alternative. Moreover, our study leads to a new corpus of generics, Gen-A-tomic, that is the largest and highest quality available to date.

下载PDF全文

下载文献需遵守相关版权规定

论文标题