CDFKD-MFS：通过多级功能共享协作的无数据知识蒸馏

论文标题

CDFKD-MFS：通过多级功能共享协作的无数据知识蒸馏

CDFKD-MFS: Collaborative Data-free Knowledge Distillation via Multi-level Feature Sharing

论文作者

Hao, Zhiwei, Luo, Yong, Wang, Zhi, Hu, Han, An, Jianping

论文摘要

最近，在资源有限的边缘设备上，强大的深神经网络（DNN）的压缩和部署已成为有吸引力的任务。尽管知识蒸馏（KD）是压缩的可行解决方案，但其对原始数据集的要求引起了隐私问题。此外，通常会集成多个审慎的模型以实现令人满意的性能。如何将多个模型压缩到小型模型中是具有挑战性的，尤其是当原始数据不可用时。为了应对这一挑战，我们提出了一个通过多级功能共享（CDFKD-MFS）称为“协作无数据知识蒸馏”的框架，该框架由一个多头学生模块，一个非对称的无数据的KD模块和一个基于注意的聚合模块组成。在此框架中，配备了多层次特征共享结构的学生模型从多个教师模型中学习，并以不对称的对抗性方式与发电机一起训练。当有一些真实样本可用时，注意模块会适应学生标头的预测，这可以进一步提高性能。我们在三个流行的计算机视觉数据集上进行了广泛的实验。特别是，与最有竞争力的替代方案相比，CIFAR-100数据集的提议框架的准确性更高1.18 \％，CALTECH-101数据集的1.67 \％\％\％，而Mini-ImageNet数据集则高2.99 \％。

Recently, the compression and deployment of powerful deep neural networks (DNNs) on resource-limited edge devices to provide intelligent services have become attractive tasks. Although knowledge distillation (KD) is a feasible solution for compression, its requirement on the original dataset raises privacy concerns. In addition, it is common to integrate multiple pretrained models to achieve satisfactory performance. How to compress multiple models into a tiny model is challenging, especially when the original data are unavailable. To tackle this challenge, we propose a framework termed collaborative data-free knowledge distillation via multi-level feature sharing (CDFKD-MFS), which consists of a multi-header student module, an asymmetric adversarial data-free KD module, and an attention-based aggregation module. In this framework, the student model equipped with a multi-level feature-sharing structure learns from multiple teacher models and is trained together with a generator in an asymmetric adversarial manner. When some real samples are available, the attention module adaptively aggregates predictions of the student headers, which can further improve performance. We conduct extensive experiments on three popular computer visual datasets. In particular, compared with the most competitive alternative, the accuracy of the proposed framework is 1.18\% higher on the CIFAR-100 dataset, 1.67\% higher on the Caltech-101 dataset, and 2.99\% higher on the mini-ImageNet dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题