通过最佳运输的代表性转移

论文标题

通过最佳运输的代表性转移

Representation Transfer by Optimal Transport

论文作者

Li, Xuhong, Grandvalet, Yves, Flamary, Rémi, Courty, Nicolas, Dou, Dejing

论文摘要

具有深网的学习通用表示需要大量的培训样本和大量的计算机资源。要学习一项新的特定任务，一个重要的问题是将通用教师的表示形式转移到学生网络。在本文中，我们建议使用基于神经元功能视图的表示之间使用度量。我们使用最佳传输来量化两个表示之间的匹配，从而产生了嵌入深网表示固有的一些不变的距离。此距离定义了一个正规化器，以促进学生的代表与老师的相似性。我们的方法可以在适用表示形式转移的任何学习环境中使用。我们在这里进行了两个标准设置：电感转移学习，其中教师的表示形式转移到了相同体系结构的学生网络中，以进行新的相关任务和知识蒸馏，其中教师的表示形式转移到了同一任务（模型压缩）的Simple Simple Architecture的学生中。我们的方法还可以解决新的学习问题。我们通过展示如何将教师的表示形式直接转移到更简单的建筑专业的学生中，以完成新的相关任务来证明这一点。

Learning generic representations with deep networks requires massive training samples and significant computer resources. To learn a new specific task, an important issue is to transfer the generic teacher's representation to a student network. In this paper, we propose to use a metric between representations that is based on a functional view of neurons. We use optimal transport to quantify the match between two representations, yielding a distance that embeds some invariances inherent to the representation of deep networks. This distance defines a regularizer promoting the similarity of the student's representation with that of the teacher. Our approach can be used in any learning context where representation transfer is applicable. We experiment here on two standard settings: inductive transfer learning, where the teacher's representation is transferred to a student network of same architecture for a new related task, and knowledge distillation, where the teacher's representation is transferred to a student of simpler architecture for the same task (model compression). Our approach also lends itself to solving new learning problems; we demonstrate this by showing how to directly transfer the teacher's representation to a simpler architecture student for a new related task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题