PanoFormer：用于室内360深度估计的全景变压器

论文标题

PanoFormer：用于室内360深度估计的全景变压器

PanoFormer: Panorama Transformer for Indoor 360 Depth Estimation

论文作者

Shen, Zhijie, Lin, Chunyu, Liao, Kang, Nie, Lang, Zheng, Zishuo, Zhao, Yao

论文摘要

现有的全景深度估计方法基于卷积神经网络（CNN）的重点是消除全景畸变，由于CNN中的固定接受场而无法有效地感知全景结构。本文提出了全景变压器（命名为PanoFormer），以估计全景图像中的深度，并带有球形域，可学习的令牌流和全景特定指标的切线斑块。特别是，我们将球形切线结构域上的斑块划分为令牌，以减少全景畸变的负面影响。由于几何结构对于深度估计是必不可少的，因此自我发项式模块通过额外的可学习令牌流重新设计。此外，考虑到球形域的特征，我们提出了两个全景特异性指标，以全面评估全景深度估计模型的性能。广泛的实验表明，我们的方法明显优于最先进的方法（SOTA）方法。此外，可以有效地扩展提出的方法以求解类似的Pixel2像素任务的语义全景分割。代码将可用。

Existing panoramic depth estimation methods based on convolutional neural networks (CNNs) focus on removing panoramic distortions, failing to perceive panoramic structures efficiently due to the fixed receptive field in CNNs. This paper proposes the panorama transformer (named PanoFormer) to estimate the depth in panorama images, with tangent patches from spherical domain, learnable token flows, and panorama specific metrics. In particular, we divide patches on the spherical tangent domain into tokens to reduce the negative effect of panoramic distortions. Since the geometric structures are essential for depth estimation, a self-attention module is redesigned with an additional learnable token flow. In addition, considering the characteristic of the spherical domain, we present two panorama-specific metrics to comprehensively evaluate the panoramic depth estimation models' performance. Extensive experiments demonstrate that our approach significantly outperforms the state-of-the-art (SOTA) methods. Furthermore, the proposed method can be effectively extended to solve semantic panorama segmentation, a similar pixel2pixel task. Code will be available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题