RGB-D语义分割的双向交叉模式特征传播带有分离和聚集门

论文标题

RGB-D语义分割的双向交叉模式特征传播带有分离和聚集门

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

论文作者

Chen, Xiaokang, Lin, Kwan-Yee, Wang, Jingbo, Wu, Wayne, Qian, Chen, Li, Hongsheng, Zeng, Gang

论文摘要

事实证明，深度信息在RGB-D图像的语义分割中是有用的提示，用于提供与RGB表示的几何对应物。大多数现有作品只需假设深度测量与RGB像素是准确且对齐的，并将问题作为跨模式特征融合建模，以获得更好的特征表示形式，以实现更准确的分割。但是，这可能不会导致令人满意的结果，因为实际深度数据通常是嘈杂的，随着网络的进度，这可能会使准确性恶化。在本文中，我们提出了一个统一，有效的交叉模式指导编码器，不仅有效地重新校准了RGB特征响应，而且还通过多个阶段提炼精确的深度信息，并汇总了两个重新校准的表示。所提出的架构的关键是一种新型的分离和聚集门控操作，该操作在跨模式聚集之前共同过滤并重新校准这两个表示。同时，一方面引入了双向多步繁殖策略，以帮助在两种方式之间传播和融合信息，另一方面，以在长期传播过程中保留其特异性。此外，我们提出的编码器可以轻松地注入先前的编码器结构中，以提高其在RGB-D语义分段上的性能。我们的模型在室内和室外具有挑战性的数据集上始终优于最先进的。这项工作的代码可在https://charlescxk.github.io/上获得

Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion to obtain better feature representations to achieve more accurate segmentation. This, however, may not lead to satisfactory results as actual depth data are generally noisy, which might worsen the accuracy as the networks go deeper. In this paper, we propose a unified and efficient Cross-modality Guided Encoder to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively. The key of the proposed architecture is a novel Separation-and-Aggregation Gating operation that jointly filters and recalibrates both representations before cross-modality aggregation. Meanwhile, a Bi-direction Multi-step Propagation strategy is introduced, on the one hand, to help to propagate and fuse information between the two modalities, and on the other hand, to preserve their specificity along the long-term propagation process. Besides, our proposed encoder can be easily injected into the previous encoder-decoder structures to boost their performance on RGB-D semantic segmentation. Our model outperforms state-of-the-arts consistently on both in-door and out-door challenging datasets. Code of this work is available at https://charlescxk.github.io/

下载PDF全文

下载文献需遵守相关版权规定

论文标题