通过教学助理的加权合奏的知识蒸馏

论文标题

通过教学助理的加权合奏的知识蒸馏

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

论文作者

Ganta, Durga Prasad, Gupta, Himel Das, Sheng, Victor S.

论文摘要

机器学习中的知识蒸馏是将知识从名为教师的大型模型转移到一个名为“学生”的较小模型的过程。知识蒸馏是将大型网络（教师）压缩到较小网络（学生）的技术之一，该网络可以部署在手机等小型设备中。当教师和学生之间的网络大小差距增加时，学生网络的表现会降低。为了解决这个问题，在教师模型和名为“助教模型”的学生模型之间采用了中间模型，这又弥合了教师与学生之间的差距。在这项研究中，我们已经表明，使用多个助教模型，可以进一步改善学生模型（较小的模型）。我们使用加权集合学习将这些多个助教模型组合在一起，我们使用了差异评估优化算法来生成权重值。

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called the teacher to a smaller model called the student. Knowledge distillation is one of the techniques to compress the large network (teacher) to a smaller network (student) that can be deployed in small devices such as mobile phones. When the network size gap between the teacher and student increases, the performance of the student network decreases. To solve this problem, an intermediate model is employed between the teacher model and the student model known as the teaching assistant model, which in turn bridges the gap between the teacher and the student. In this research, we have shown that using multiple teaching assistant models, the student model (the smaller model) can be further improved. We combined these multiple teaching assistant models using weighted ensemble learning where we have used a differential evaluation optimization algorithm to generate the weight values.

下载PDF全文

下载文献需遵守相关版权规定

论文标题