通过观察学习路径来更好的监督信号

论文标题

通过观察学习路径来更好的监督信号

Better Supervisory Signals by Observing Learning Paths

论文作者

Ren, Yi, Guo, Shangmin, Sutherland, Danica J.

论文摘要

更好的监督模型可能具有更好的性能。在本文中，我们首先阐明了如何良好的分类问题监督，然后根据我们的标准来解释两种现有的标签方法，标签平滑和知识蒸馏。为了进一步回答为什么和如何更好地进行监督，我们观察到学习路径，即，对于每个培训样本，在训练过程中模型预测的轨迹。我们发现，该模型可以通过“曲折”学习路径自发地完善“不良”标签，该路径均在玩具和真实数据集上出现。观察学习路径不仅为了解知识蒸馏，过度拟合和学习动态提供了一种新的观点，而且还揭示了教师网络的监督信号在培训真实任务的最佳点附近可能是非常不稳定的。受此启发的启发，我们提出了一种新的知识蒸馏方案Filter-KD，该方案改善了各种设置中的下游分类性能。

Better-supervised models might have better performance. In this paper, we first clarify what makes for good supervision for a classification problem, and then explain two existing label refining methods, label smoothing and knowledge distillation, in terms of our proposed criterion. To further answer why and how better supervision emerges, we observe the learning path, i.e., the trajectory of the model's predictions during training, for each training sample. We find that the model can spontaneously refine "bad" labels through a "zig-zag" learning path, which occurs on both toy and real datasets. Observing the learning path not only provides a new perspective for understanding knowledge distillation, overfitting, and learning dynamics, but also reveals that the supervisory signal of a teacher network can be very unstable near the best points in training on real tasks. Inspired by this, we propose a new knowledge distillation scheme, Filter-KD, which improves downstream classification performance in various settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题