掩盖的生成蒸馏

论文标题

掩盖的生成蒸馏

Masked Generative Distillation

论文作者

Yang, Zhendong, Li, Zhe, Shao, Mingqi, Shi, Dachuan, Yuan, Zehuan, Yuan, Chun

论文摘要

知识蒸馏已成功地应用于各种任务。当前的蒸馏算法通常通过模仿教师的产出来改善学生的表现。本文表明，教师还可以通过指导学生的功能恢复来提高学生的代表权。从这个角度来看，我们提出了掩盖的生成蒸馏（MGD），这很简单：我们掩盖了学生功能的随机像素，并强迫它通过简单的块生成教师的完整功能。 MGD是一种真正基于特征的蒸馏方法，可以在各种任务上使用，包括图像分类，对象检测，语义分割和实例分割。我们在不同的模型上进行了广泛的数据集尝试，结果表明所有学生都取得了出色的改进。值得注意的是，我们将RESNET-18从69.90％提高到71.69％的Imagenet Top-1精度，带有Resnet-50骨架的视网膜从37.4到41.0界盒地图，基于Resnet-50从33.1到33.1至36.2 Mask Mabs Map和DeepLabV3，基于Resnet-babv3，基于Resnet-babv3，基于Resnet-babv3。我们的代码可在https://github.com/yzd-v/mgd上找到。

Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can also improve students' representation power by guiding students' feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student's feature and force it to generate the teacher's full feature through a simple block. MGD is a truly general feature-based distillation method, which can be utilized on various tasks, including image classification, object detection, semantic segmentation and instance segmentation. We experiment on different models with extensive datasets and the results show that all the students achieve excellent improvements. Notably, we boost ResNet-18 from 69.90% to 71.69% ImageNet top-1 accuracy, RetinaNet with ResNet-50 backbone from 37.4 to 41.0 Boundingbox mAP, SOLO based on ResNet-50 from 33.1 to 36.2 Mask mAP and DeepLabV3 based on ResNet-18 from 73.20 to 76.02 mIoU. Our codes are available at https://github.com/yzd-v/MGD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题