从未标记的视频中逐渐学习视频显着对象检测

论文标题

从未标记的视频中逐渐学习视频显着对象检测

Learning Video Salient Object Detection Progressively from Unlabeled Videos

论文作者

Xu, Binwei, Liang, Haoran, Ni, Wentian, Gong, Weihua, Liang, Ronghua, Chen, Peng

论文摘要

最近基于深度学习的视频显着对象检测（VSOD）取得了一些突破，但是这些方法依赖于带有像素的注释，弱注释或部分像素的注释的昂贵注释视频。在本文中，基于VSOD和Image显着对象检测（SOD）之间的相似性和差异，我们通过渐进式框架提出了一种新颖的VSOD方法，该方法在不利用任何视频注释的情况下以序列定位和片段显着对象。为了有效地使用SOD数据集中学到的知识，我们引入了动态显着性，以弥补定位过程中SOD缺乏运动信息，但保留了相同的细分过程。具体而言，提出了一种用于生成时空位置标签的算法，该算法包括在相邻框架中生成高功能位置标签和跟踪显着对象。基于这些位置标签，提出了一个两流定位网络，该网络引入了视频显着对象定位的光流分支。尽管我们的方法根本不需要标记的视频，但是戴维斯，FBM，Visal，Vos和Davsod的五个公共基准的实验结果表明，我们所提出的方法具有充分监督的方法，并且优于最先进的弱点和无人监督的方法。

Recent deep learning-based video salient object detection (VSOD) has achieved some breakthrough, but these methods rely on expensive annotated videos with pixel-wise annotations, weak annotations, or part of the pixel-wise annotations. In this paper, based on the similarities and the differences between VSOD and image salient object detection (SOD), we propose a novel VSOD method via a progressive framework that locates and segments salient objects in sequence without utilizing any video annotation. To use the knowledge learned in the SOD dataset for VSOD efficiently, we introduce dynamic saliency to compensate for the lack of motion information of SOD during the locating process but retain the same fine segmenting process. Specifically, an algorithm for generating spatiotemporal location labels, which consists of generating high-saliency location labels and tracking salient objects in adjacent frames, is proposed. Based on these location labels, a two-stream locating network that introduces an optical flow branch for video salient object locating is presented. Although our method does not require labeled video at all, the experimental results on five public benchmarks of DAVIS, FBMS, ViSal, VOS, and DAVSOD demonstrate that our proposed method is competitive with fully supervised methods and outperforms the state-of-the-art weakly and unsupervised methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题