论文标题
通过多分支多样性增强的在线知识蒸馏
Online Knowledge Distillation via Multi-branch Diversity Enhancement
论文作者
论文摘要
知识蒸馏是一种将知识从繁琐的教师模型转移到轻量级学生模型的有效方法。在线知识蒸馏使用多个学生模型的结合预测结果作为训练每个学生模型的软目标。但是,均质化问题将导致难以进一步改善模型性能。在这项工作中,我们提出了一种新的蒸馏方法,以增强多个学生模型之间的多样性。我们介绍特征融合模块(FFM),该模块通过整合了多个学生模型的最后一个块中包含的丰富语义信息来提高网络中注意力机制的性能。此外,我们使用分类器多元化(CD)损失函数来增强学生模型之间的差异并提供更好的合奏结果。广泛的实验证明,我们的方法显着增强了学生模型之间的多样性,并带来了更好的蒸馏性能。我们在三个图像分类数据集上评估我们的方法:CIFAR-10/100和Cinic-10。结果表明,我们的方法在这些数据集上实现了最先进的性能。
Knowledge distillation is an effective method to transfer the knowledge from the cumbersome teacher model to the lightweight student model. Online knowledge distillation uses the ensembled prediction results of multiple student models as soft targets to train each student model. However, the homogenization problem will lead to difficulty in further improving model performance. In this work, we propose a new distillation method to enhance the diversity among multiple student models. We introduce Feature Fusion Module (FFM), which improves the performance of the attention mechanism in the network by integrating rich semantic information contained in the last block of multiple student models. Furthermore, we use the Classifier Diversification(CD) loss function to strengthen the differences between the student models and deliver a better ensemble result. Extensive experiments proved that our method significantly enhances the diversity among student models and brings better distillation performance. We evaluate our method on three image classification datasets: CIFAR-10/100 and CINIC-10. The results show that our method achieves state-of-the-art performance on these datasets.