您的“火烈鸟”是我的“鸟”：细颗粒，还是不是

论文标题

您的“火烈鸟”是我的“鸟”：细颗粒，还是不是

Your "Flamingo" is My "Bird": Fine-Grained, or Not

论文作者

Chang, Dongliang, Pang, Kaiyue, Zheng, Yixiao, Ma, Zhanyu, Song, Yi-Zhe, Guo, Jun

论文摘要

您在图1中看到的是“火烈鸟”还是“鸟”，这是我们在本文中提出的问题。虽然细粒度的视觉分类（FGVC）努力达到前者，但我们大多数非专家只是“鸟”可能就足够了。因此，真正的问题是 - 在不同水平的专业知识下，我们如何为不同的细粒度定义量身定制。为此，我们重新提出了FGVC的传统环境，从单标签分类到预定的粗到1个标签层次结构的自上而下的遍历 - 使我们的答案变成“鸟” - >“ phoenicopteriformes” - >“ phoenicopteriformes” - >“ phoenicopteridae” - >“ phoenicopteridae” - >“ flamingo”。为了解决这个新问题，我们首先进行了一项全面的人类研究，我们确认大多数参与者都喜欢多晶型标签，无论他们是否认为自己是专家。然后，我们发现：粗级标签预测加剧了细粒度学习的关键直觉，但精细的功能可以更好地学习粗级分类器。这一发现使我们能够设计一个非常简单的方法，尽管出奇的有效解决方案解决了我们的新问题，我们（i）利用特定水平的分类头将具有精细粒度的粗级特征解散了粗级特征，并且（ii）允许更细粒度的功能参与粗糙的标签预测，这反过来有助于更好地分解。实验表明，我们的方法在新的FGVC环境中实现了卓越的性能，并且在传统的单标签FGVC问题上的性能也比最新的表现更好。由于其简单性，我们的方法可以轻松地在任何现有的FGVC框架之上实现，并且无参数。

Whether what you see in Figure 1 is a "flamingo" or a "bird", is the question we ask in this paper. While fine-grained visual classification (FGVC) strives to arrive at the former, for the majority of us non-experts just "bird" would probably suffice. The real question is therefore -- how can we tailor for different fine-grained definitions under divergent levels of expertise. For that, we re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy -- so that our answer becomes "bird"-->"Phoenicopteriformes"-->"Phoenicopteridae"-->"flamingo". To approach this new problem, we first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels, regardless whether they consider themselves experts. We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning, yet fine-level feature betters the learning of coarse-level classifier. This discovery enables us to design a very simple albeit surprisingly effective solution to our new problem, where we (i) leverage level-specific classification heads to disentangle coarse-level features with fine-grained ones, and (ii) allow finer-grained features to participate in coarser-grained label predictions, which in turn helps with better disentanglement. Experiments show that our method achieves superior performance in the new FGVC setting, and performs better than state-of-the-art on traditional single-label FGVC problem as well. Thanks to its simplicity, our method can be easily implemented on top of any existing FGVC frameworks and is parameter-free.

下载PDF全文

下载文献需遵守相关版权规定

论文标题