论文标题
通过知识一致性和相关性进行多层次知识蒸馏
Multi-level Knowledge Distillation via Knowledge Alignment and Correlation
论文作者
论文摘要
知识蒸馏(KD)已成为模型压缩和知识转移的重要技术。在这项工作中,我们首先对不同KD方法传递的知识进行了全面分析。我们证明,传统的KD方法最大程度地减少网络之间的软键输出的KL差异,与单个样本的知识对齐有关。同时,最近基于对比的基于学习的KD方法主要在不同样本之间转移关系知识,即知识相关性。尽管将全部知识从教师转移到学生很重要,但我们通过有效考虑知识一致性和相关性来介绍多层次知识蒸馏(MLKD)。 MLKD是任务不合时宜的和模型的敏捷,可以轻松地从受监督或自制的预审议的教师中转移知识。我们表明,MLKD可以提高学习表示的可靠性和可转移性。实验表明,MLKD在包括不同的(a)训练策略(b)网络体系结构(c)数据集(d)任务的大量实验设置上优于其他最先进的方法。
Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that traditional KD methods, which minimize the KL divergence of softmax outputs between networks, are related to the knowledge alignment of an individual sample only. Meanwhile, recent contrastive learning-based KD methods mainly transfer relational knowledge between different samples, namely, knowledge correlation. While it is important to transfer the full knowledge from teacher to student, we introduce the Multi-level Knowledge Distillation (MLKD) by effectively considering both knowledge alignment and correlation. MLKD is task-agnostic and model-agnostic, and can easily transfer knowledge from supervised or self-supervised pretrained teachers. We show that MLKD can improve the reliability and transferability of learned representations. Experiments demonstrate that MLKD outperforms other state-of-the-art methods on a large number of experimental settings including different (a) pretraining strategies (b) network architectures (c) datasets (d) tasks.