论文标题
视频中的对象分割的两流网络
Two-Stream Networks for Object Segmentation in Videos
论文作者
论文摘要
现有的基于匹配的方法通过从像素级内存中检索支持功能执行视频对象细分(VOS),而某些像素可能会遭受内存中缺乏对应关系(即看不见),这不可避免地限制了他们的细分性能。在本文中,我们提出了一个两流网络(TSN)。我们的TSN包含(i)带有常规像素级内存的像素流,以根据其像素级内存检索进行分割可见像素。 (ii)一个看不见的像素的实例流,其中对实例的整体理解是在动态分割头上以目标实例的特征进行调节的。 (iii)一个像素分裂模块生成路由图,将两个流的输出嵌入在一起。紧凑的实例流有效地提高了看不见的像素的分割精度,同时将两个流与自适应路由图融合在一起,从而导致整体性能提升。通过广泛的实验,我们证明了我们提出的TSN的有效性,并且还报告了2018年YouTube-VOS的最先进性能为86.1%,而Davis-2017验证验证分配中的最新性能为87.5%。
Existing matching-based approaches perform video object segmentation (VOS) via retrieving support features from a pixel-level memory, while some pixels may suffer from lack of correspondence in the memory (i.e., unseen), which inevitably limits their segmentation performance. In this paper, we present a Two-Stream Network (TSN). Our TSN includes (i) a pixel stream with a conventional pixel-level memory, to segment the seen pixels based on their pixellevel memory retrieval. (ii) an instance stream for the unseen pixels, where a holistic understanding of the instance is obtained with dynamic segmentation heads conditioned on the features of the target instance. (iii) a pixel division module generating a routing map, with which output embeddings of the two streams are fused together. The compact instance stream effectively improves the segmentation accuracy of the unseen pixels, while fusing two streams with the adaptive routing map leads to an overall performance boost. Through extensive experiments, we demonstrate the effectiveness of our proposed TSN, and we also report state-of-the-art performance of 86.1% on YouTube-VOS 2018 and 87.5% on the DAVIS-2017 validation split.