渐进的网络嫁接，用于几次知识蒸馏

论文标题

渐进的网络嫁接，用于几次知识蒸馏

Progressive Network Grafting for Few-Shot Knowledge Distillation

论文作者

Shen, Chengchao, Wang, Xinchao, Yin, Youtan, Song, Jie, Luo, Sihui, Song, Mingli

论文摘要

知识蒸馏表明在深层模型压缩中表现令人鼓舞。但是，大多数现有的方法都需要大量标记的数据来完成知识传输，从而使模型压缩成为繁琐且昂贵的过程。在本文中，我们调查了实际的几种知识蒸馏场景，在该场景中，我们假设每个类别都可以使用几个没有人类注释的样本。为此，我们引入了针对几个弹药数据量身定制的原则性双阶段蒸馏方案。在第一步中，我们将学生嫁接到老师中，并了解嫁接的块的参数与其他老师障碍的参数交织在一起。在第二步中，训练有素的学生街区逐渐连接，然后一起嫁接到教师网络上，从而使学习的学生障碍可以彼此适应并最终取代教师网络。实验表明，我们的方法只有几个未标记的样本，可以在CIFAR10，CIFAR100和ILSVRC-2012上取得满足的结果。在CIFAR10和CIFAR100上，我们的性能甚至与使用完整数据集的知识蒸馏方案相当。源代码可在https://github.com/zju-vipa/netgraft上找到。

Knowledge distillation has demonstrated encouraging performances in deep model compression. Most existing approaches, however, require massive labeled data to accomplish the knowledge transfer, making the model compression a cumbersome and costly process. In this paper, we investigate the practical few-shot knowledge distillation scenario, where we assume only a few samples without human annotations are available for each category. To this end, we introduce a principled dual-stage distillation scheme tailored for few-shot data. In the first step, we graft the student blocks one by one onto the teacher, and learn the parameters of the grafted block intertwined with those of the other teacher blocks. In the second step, the trained student blocks are progressively connected and then together grafted onto the teacher network, allowing the learned student blocks to adapt themselves to each other and eventually replace the teacher network. Experiments demonstrate that our approach, with only a few unlabeled samples, achieves gratifying results on CIFAR10, CIFAR100, and ILSVRC-2012. On CIFAR10 and CIFAR100, our performances are even on par with those of knowledge distillation schemes that utilize the full datasets. The source code is available at https://github.com/zju-vipa/NetGraft.

下载PDF全文

下载文献需遵守相关版权规定

论文标题