促使大型预训练的视觉模型用于组成概念学习

论文标题

促使大型预训练的视觉模型用于组成概念学习

Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

论文作者

Xu, Guangyue, Kordjamshidi, Parisa, Chai, Joyce

论文摘要

这项工作探讨了基于及时的学习框架中大型预训练的视觉模型（VLM）的零摄像组成学习能力，并提出了一个模型（\ textit {strespcompvl}），以解决综合零零局部学习（CZSL）问题。 \ textIt {strumentCompvl}做出了两个设计选择：首先，它使用软宣传而不是硬宣传来注入可学习的参数，以重新编程VLMS以进行组成学习。其次，为了应对构图挑战，它使用软件层来学习不同组合的原始概念。通过将软件和软宣传相结合，\ textIt {strumentCompvl}在MIT-States数据集中实现了最先进的性能。此外，与其他基于夹的方法相比，我们提出的模型可以实现一致的改进，该方法显示了提议的CZSL提示策略的有效性。

This work explores the zero-shot compositional learning ability of large pre-trained vision-language models(VLMs) within the prompt-based learning framework and propose a model (\textit{PromptCompVL}) to solve the compositonal zero-shot learning (CZSL) problem. \textit{PromptCompVL} makes two design choices: first, it uses a soft-prompting instead of hard-prompting to inject learnable parameters to reprogram VLMs for compositional learning. Second, to address the compositional challenge, it uses the soft-embedding layer to learn primitive concepts in different combinations. By combining both soft-embedding and soft-prompting, \textit{PromptCompVL} achieves state-of-the-art performance on the MIT-States dataset. Furthermore, our proposed model achieves consistent improvement compared to other CLIP-based methods which shows the effectiveness of the proposed prompting strategies for CZSL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题