是什么导致对象建议的概括？

论文标题

是什么导致对象建议的概括？

What leads to generalization of object proposals?

论文作者

Wang, Rui, Mahajan, Dhruv, Ramanathan, Vignesh

论文摘要

对象提案生成通常是许多检测模型中的第一步。训练一个良好的建议模型是有利可图的，该模型概括为看不见的课程。这可以帮助将检测模型缩放到更少的注释的较大类。在此激励的情况下，我们研究了在一小部分源类中训练的检测模型如何提供推广到看不见类别的建议。我们系统地研究了良好概括所需的数据集的性能 - 视觉多样性和标签空间粒度。我们显示了使用细粒标签和粗标签之间的权衡。我们介绍了原型类的想法：一组训练检测模型所需的足够和必要的类，以更有效的方式获得广义建议。在开放式图像V4数据集上，我们显示只有25％的类可以选择以形成这样的原型集。在平均召回（AR）方面，经过这些类别训练的模型的最终建议仅比使用所有类别的建议差4.3％。我们还证明，与单阶段网络（如Retinanet）相比，更快的R-CNN模型可以更好地概括提案。

Object proposal generation is often the first step in many detection models. It is lucrative to train a good proposal model, that generalizes to unseen classes. This could help scaling detection models to larger number of classes with fewer annotations. Motivated by this, we study how a detection model trained on a small set of source classes can provide proposals that generalize to unseen classes. We systematically study the properties of the dataset - visual diversity and label space granularity - required for good generalization. We show the trade-off between using fine-grained labels and coarse labels. We introduce the idea of prototypical classes: a set of sufficient and necessary classes required to train a detection model to obtain generalized proposals in a more data-efficient way. On the Open Images V4 dataset, we show that only 25% of the classes can be selected to form such a prototypical set. The resulting proposals from a model trained with these classes is only 4.3% worse than using all the classes, in terms of average recall (AR). We also demonstrate that Faster R-CNN model leads to better generalization of proposals compared to a single-stage network like RetinaNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题