枢纽 - 校园：从预训练模型的枢纽中转移学习

论文标题

枢纽 - 校园：从预训练模型的枢纽中转移学习

Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

论文作者

Shu, Yang, Cao, Zhangjie, Zhang, Ziyang, Wang, Jianmin, Long, Mingsheng

论文摘要

转移学习旨在利用从预训练的模型中获得知识，以使目标任务受益。先前的转移学习工作主要是从单个模型转移。但是，随着从不同资源预先训练的深度模型的出现，由具有各种体系结构的各种模型组成的模型中心，预先培训的数据集和学习范式可用。直接将单模传输学习方法应用于每个模型，都浪费了对模型中心的丰富知识，并且遭受了高计算成本。在本文中，我们提出了一个枢纽 - 校园框架，以实现从模型中心的知识转移。该框架生成数据依赖的途径权重，基于我们在输入级别分配路径路由，以确定激活哪些预训练模型并通过了哪些预训练的模型，然后在输出级别设置了途径聚集，以从不同模型汇总知识以进行预测。提出的框架可以通过针对特定于任务的损失进行端到端训练，在该损失中，它将学会探索更好的途径配置并利用每个目标基准的预训练模型中的知识。我们利用嘈杂的途径发生器并设计探索损失，以进一步探索整个模型中心的不同途径。为了充分利用预训练模型中的知识，每个模型都会通过激活它的特定数据进一步培训，从而确保其性能并增强知识传递。计算机视觉和增强学习任务的实验结果表明，所提出的枢纽式框架实现了模型集线器传输学习的最新性能。

Transfer learning aims to leverage knowledge from pre-trained models to benefit the target task. Prior transfer learning work mainly transfers from a single model. However, with the emergence of deep models pre-trained from different resources, model hubs consisting of diverse models with various architectures, pre-trained datasets and learning paradigms are available. Directly applying single-model transfer learning methods to each model wastes the abundant knowledge of the model hub and suffers from high computational cost. In this paper, we propose a Hub-Pathway framework to enable knowledge transfer from a model hub. The framework generates data-dependent pathway weights, based on which we assign the pathway routes at the input level to decide which pre-trained models are activated and passed through, and then set the pathway aggregation at the output level to aggregate the knowledge from different models to make predictions. The proposed framework can be trained end-to-end with the target task-specific loss, where it learns to explore better pathway configurations and exploit the knowledge in pre-trained models for each target datum. We utilize a noisy pathway generator and design an exploration loss to further explore different pathways throughout the model hub. To fully exploit the knowledge in pre-trained models, each model is further trained by specific data that activate it, which ensures its performance and enhances knowledge transfer. Experiment results on computer vision and reinforcement learning tasks demonstrate that the proposed Hub-Pathway framework achieves the state-of-the-art performance for model hub transfer learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题