Uni-proviver-Moe：学习有条件的MOE的稀疏通才模型

论文标题

Uni-proviver-Moe：学习有条件的MOE的稀疏通才模型

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

论文作者

Zhu, Jinguo, Zhu, Xizhou, Wang, Wenhai, Wang, Xiaohua, Li, Hongsheng, Wang, Xiaogang, Dai, Jifeng

论文摘要

为了构建人工神经网络，例如生物智能系统，最近的作品将许多任务统一为通才模型，该模型可以使用共享参数处理各种任务，并且没有任何特定于任务的模块。尽管通才模型在各种基准上取得了令人鼓舞的结果，但与任务特殊模型相比，它们在某些任务上具有绩效降解。在这项工作中，我们发现不同任务和方式之间的干扰是这种现象的主要因素。为了减轻这种干扰，我们将条件混合物（条件MOE）引入通才模型。提议在不同级别的条件下采用路由策略来考虑培训/推理成本和概括能力。通过纳入提出的条件MOE，最近提出的通才模型Uni-Pectiver可以有效地减轻任务和模式的干扰，并通过迅速调整1％的下游数据来实现一系列下游任务的最新结果。此外，有条件的MOE的引入仍然具有通才模型对新任务（例如视频文本检索和视频标题）进行零射的推断的概括能力。应发布代码和预培训的通才模型。

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., video-text retrieval and video caption. Code and pre-trained generalist models shall be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题