无监督的及时学习视觉模型

论文标题

无监督的及时学习视觉模型

Unsupervised Prompt Learning for Vision-Language Models

论文作者

Huang, Tony, Chu, Jack, Wei, Fangyun

论文摘要

诸如剪辑之类的对比视觉模型在转移学习方面表现出了很大的进步。在推理阶段，需要仔细设计适当的文本描述，也称为提示，以正确地对给定的图像进行分类。为了避免繁琐的及时工程，最近的作品，例如Coop，Clip-Adapter和Tip-Adapter，建议将视觉模型改编在下游图像识别任务上，以在一小部分标记的数据上。尽管实现了有希望的改进，但是需要来自目标数据集的标记数据可能会限制可扩展性。在本文中，我们探讨了一种不同的情况，其中目标数据集的标签未经证实，并提出了一种无监督的及时学习方法（UPL）方法，以避免迅速工程，同时改善类似夹子的视觉模型的传递性能。据我们所知，UPL是第一项将无监督学习引入及时学习的工作。在实验上，我们的UPL在ImageNet以及其他10个数据集上及时使用及时工程的OUTL优于原始剪辑。增强版本的UPL甚至与大多数数据集的8-Shot Coop和8-Shot Tip-Adapter都具有竞争力。代码和型号可在https://github.com/tonyhuang2022/upl上找到。

Contrastive vision-language models like CLIP have shown great progress in transfer learning. In the inference stage, the proper text description, also known as prompt, needs to be carefully designed to correctly classify the given images. In order to avoid laborious prompt engineering, recent works such as CoOp, CLIP-Adapter and Tip-Adapter propose to adapt vision-language models for downstream image recognition tasks on a small set of labeled data. Though promising improvements are achieved, requiring labeled data from the target datasets may restrict the scalability. In this paper, we explore a different scenario, in which the labels of the target datasets are unprovided, and we present an unsupervised prompt learning (UPL) approach to avoid prompt engineering while simultaneously improving transfer performance of CLIP-like vision-language models. As far as we know, UPL is the first work to introduce unsupervised learning into prompt learning. Experimentally, our UPL outperforms original CLIP with prompt engineering on ImageNet as well as other 10 datasets. An enhanced version of UPL is even competitive with the 8-shot CoOp and the 8-shot TIP-Adapter on most datasets. Code and models are available at https://github.com/tonyhuang2022/UPL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题