加权蒸馏，未标记的例子

论文标题

加权蒸馏，未标记的例子

Weighted Distillation with Unlabeled Examples

论文作者

Iliopoulos, Fotis, Kontonis, Vasilis, Baykal, Cenk, Menghani, Gaurav, Trinh, Khoa, Vee, Erik

论文摘要

带有未标记示例的蒸馏是一种流行而有力的方法，用于训练标记数据量有限的设置中的深层神经网络：对可用的标签数据进行了大型““老师”神经网络经过培训，然后用于在无标记数据集上生成标签（通常大于大的大小）。然后，将这些标签用于训练较小的“学生”模型，该模型实际上将被部署。自然，该方法的成功取决于教师标签的质量，因为如果对数据进行培训，则可能会混淆学生。本文提出了一种原则性的方法，该方法是基于“依据”重新授予针对蒸馏培训范式量身定制的学生损失功能的原则方法。我们的方法是无参数，无数据的，并且易于实现。我们证明了对流行的学术数据集的重大改进，并伴随我们的结果进行了理论分析，该分析严格地证明了在某些情况下我们方法的性能。

Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited: A large ''teacher'' neural network is trained on the labeled data available, and then it is used to generate labels on an unlabeled dataset (typically much larger in size). These labels are then utilized to train the smaller ''student'' model which will actually be deployed. Naturally, the success of the approach depends on the quality of the teacher's labels, since the student could be confused if trained on inaccurate data. This paper proposes a principled approach for addressing this issue based on a ''debiasing'' reweighting of the student's loss function tailored to the distillation training paradigm. Our method is hyper-parameter free, data-agnostic, and simple to implement. We demonstrate significant improvements on popular academic datasets and we accompany our results with a theoretical analysis which rigorously justifies the performance of our method in certain settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题