构图几乎没有原始发现和增强

论文标题

构图几乎没有原始发现和增强

Compositional Few-Shot Recognition with Primitive Discovery and Enhancing

论文作者

Zou, Yixiong, Zhang, Shanghang, Chen, Ke, Tian, Yonghong, Wang, Yaowei, Moura, José M. F.

论文摘要

只有很少的培训样本，很少有射击学习（FSL）旨在认可新的课程，这仍然是深度学习的巨大挑战。但是，人类只能使用几个样本轻松识别新颖的课程。这种能力的一个关键组成部分是人类可以执行的组成识别，该识别在认知科学中已经很好地研究了，但在FSL中却没有很好地探索。受到人类这种能力的启发，模仿了人类学习视觉原始的能力并构成原始人以识别新课程的能力，我们提出了一种方法，以学习一种由重要原始人组成的特征表示，该特征是由重要的原始培训，该特征是由两个部分共同培训的，即原始发现和原始的增强。在原始发现中，我们专注于通过从图像分裂的顺序进行自学与对象部分相关的基础，避免了额外的繁琐注释并减轻语义差距的效果。在原始的增强中，受到有关深网的解释性研究的启发，我们为FSL基线模型提供了组成视图。为了修改该模型以进行有效的组成，受到数学推论和生物学研究的启发（Hebbian学习规则和获胜者 - 全部机制），我们提出了一种软组成机制，通过扩大重要原始物质的激活，同时减少其他原始物质，从而增强重要的原始性和更好地利用这些原始类型的新颖类别。对公共基准测试的广泛实验都是对几张图像分类和视频识别任务进行的。我们的方法在所有这些数据集上实现了最先进的性能，并显示出更好的解释性。

Few-shot learning (FSL) aims at recognizing novel classes given only few training samples, which still remains a great challenge for deep learning. However, humans can easily recognize novel classes with only few samples. A key component of such ability is the compositional recognition that human can perform, which has been well studied in cognitive science but is not well explored in FSL. Inspired by such capability of humans, to imitate humans' ability of learning visual primitives and composing primitives to recognize novel classes, we propose an approach to FSL to learn a feature representation composed of important primitives, which is jointly trained with two parts, i.e. primitive discovery and primitive enhancing. In primitive discovery, we focus on learning primitives related to object parts by self-supervision from the order of image splits, avoiding extra laborious annotations and alleviating the effect of semantic gaps. In primitive enhancing, inspired by current studies on the interpretability of deep networks, we provide our composition view for the FSL baseline model. To modify this model for effective composition, inspired by both mathematical deduction and biological studies (the Hebbian Learning rule and the Winner-Take-All mechanism), we propose a soft composition mechanism by enlarging the activation of important primitives while reducing that of others, so as to enhance the influence of important primitives and better utilize these primitives to compose novel classes. Extensive experiments on public benchmarks are conducted on both the few-shot image classification and video recognition tasks. Our method achieves the state-of-the-art performance on all these datasets and shows better interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题