box2mask：盒子监督实例通过级别的演变进行分割

论文标题

box2mask：盒子监督实例通过级别的演变进行分割

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

论文作者

Li, Wentong, Liu, Wenyu, Zhu, Jianke, Cui, Miaomiao, Yu, Risheng, Hua, Xiansheng, Zhang, Lei

论文摘要

与使用像素面罩标签完全监督的方法相反，盒子监督实例分段利用了简单的盒子注释，这最近引起了越来越多的研究关注。本文介绍了一种新颖的单发实例分割方法，即Box2mask，该方法将经典级别的演变模型集成到深度神经网络学习中，以实现仅在边界框的监督下实现准确的掩盖预测。具体而言，使用输入图像及其深度特征是隐式发展级别曲线的，并且使用基于像素亲和力核的局部一致性模块用于挖掘局部上下文和空间关系。开发了两种类型的单阶段框架，即基于CNN的基于CNN和基于变压器的框架，以增强盒子监督实例细分的级别演化，每个框架由三个基本组件组成：实例意识到解码器，盒子级级别匹配分配和级别的Evolution。通过最小化级别的能量函数，可以在其边界框注释中迭代优化每个实例的掩模图。在五个具有挑战性的测试台上的实验结果，涵盖了一般场景，遥感，医学和场景文本图像，证明了我们提出的Box2mask方法的出色性能，用于盒子监督实例细分。特别是，使用Swin-Transformer大型主链，我们的Box2mask在可可上获得了42.4％的掩码AP，这与最近开发的全面掩护的方法相当。该代码可在以下网址提供：https：//github.com/liwentomng/boxlevelset。

In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations. Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution. By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation. The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation. In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods. The code is available at: https://github.com/LiWentomng/boxlevelset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题