论文标题
发现可以移动的对象
Discovering Objects that Can Move
论文作者
论文摘要
本文研究对象发现的问题 - 将物体与背景分开,而无需手动标签。现有方法利用外观提示,例如颜色,纹理和位置,将像素分组为类似对象的区域。但是,通过单独依靠外观,这些方法无法将对象与混乱场景中的背景区分开。这是一个基本限制,因为对象的定义本质上是模棱两可的,并且与上下文有关。为了解决这种歧义,我们选择专注于动态对象 - 可以在世界上独立移动的实体。然后,我们将最新的基于自动编码器的框架扩展为无监督的对象发现,从玩具合成图像到复杂的现实世界场景。为此,我们简化了它们的体系结构,并通过从一般运动分割算法中使用弱学习信号来增强所得模型。我们的实验表明,尽管仅捕获了移动的一小部分对象,但该信号足以概括到动态对象的移动和静态实例。我们表明,我们的型号会缩放到新收集的带有街头驾驶场景的新现实合成数据集。此外,我们利用本数据集中的地面真相分段和流注释进行彻底消融和评估。最后,我们对现实世界Kitti基准测试的实验表明,所提出的方法通过利用运动提示来优于启发式和基于学习的方法。
This paper studies the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. However, by relying on appearance alone, these methods fail to separate objects from the background in cluttered scenes. This is a fundamental limitation since the definition of an object is inherently ambiguous and context-dependent. To resolve this ambiguity, we choose to focus on dynamic objects -- entities that can move independently in the world. We then scale the recent auto-encoder based frameworks for unsupervised object discovery from toy synthetic images to complex real-world scenes. To this end, we simplify their architecture, and augment the resulting model with a weak learning signal from general motion segmentation algorithms. Our experiments demonstrate that, despite only capturing a small subset of the objects that move, this signal is enough to generalize to segment both moving and static instances of dynamic objects. We show that our model scales to a newly collected, photo-realistic synthetic dataset with street driving scenarios. Additionally, we leverage ground truth segmentation and flow annotations in this dataset for thorough ablation and evaluation. Finally, our experiments on the real-world KITTI benchmark demonstrate that the proposed approach outperforms both heuristic- and learning-based methods by capitalizing on motion cues.