论文标题

构造-VL:无数据持续结构化VL概念学习

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

论文作者

Smith, James Seale, Cascante-Bonilla, Paola, Arbelle, Assaf, Kim, Donghyun, Panda, Rameswar, Cox, David, Yang, Diyi, Kira, Zsolt, Feris, Rogerio, Karlinsky, Leonid

论文摘要

最近,大规模的预训练的视觉和语言(VL)基础模型在许多零射击的下游任务中表现出显着的功能,从而取得了竞争成果,以识别由短文本提示所定义的对象。但是,也已经表明,VL模型在结构化VL概念(SVLC)推理中仍然是脆弱的,例如识别对象属性,状态和对象之间关系的能力。这导致了推理错误,通过教导VL模型缺少SVLC技能,需要纠正这些错误;通常,必须使用发现该问题的私人数据来完成此操作,这自然会导致无数据的连续(无任务ID)VL学习设置。在这项工作中,我们介绍了第一个连续的无数据结构VL概念学习(construct-vl)基准,并表明它对于许多现有的无数据无数据策略都充满挑战。因此,我们提出了一种无数据的方法,该方法由一种新的对抗伪复制(APR)的方法组成,该方法从过去的任务模型中生成了过去任务的对抗性提醒。为了有效地使用此方法,我们还提出了连续参数效率的分层 - 洛拉(Lalo)神经体系结构,允许在火车时不可使用所有过去模型。我们表明,这种方法的表现优于所有无数据方法,而甚至匹配某些级别的经验重新播放(对于必须保留数据私人关系的应用程序)甚至匹配)。我们的代码可在https://github.com/jamessealesmith/construct-vl上公开获取

Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object attributes, states, and inter-object relations. This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting. In this work, we introduce the first Continual Data-Free Structured VL Concepts Learning (ConStruct-VL) benchmark and show it is challenging for many existing data-free CL strategies. We, therefore, propose a data-free method comprised of a new approach of Adversarial Pseudo-Replay (APR) which generates adversarial reminders of past tasks from past task models. To use this method efficiently, we also propose a continual parameter-efficient Layered-LoRA (LaLo) neural architecture allowing no-memory-cost access to all past models at train time. We show this approach outperforms all data-free methods by as much as ~7% while even matching some levels of experience-replay (prohibitive for applications where data-privacy must be preserved). Our code is publicly available at https://github.com/jamessealesmith/ConStruct-VL

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源