通过观点对齐和融合的可推广人员重新识别

论文标题

通过观点对齐和融合的可推广人员重新识别

Generalizable Person Re-Identification via Viewpoint Alignment and Fusion

论文作者

Jiao, Bingliang, Liu, Lingqiao, Gao, Liying, Lin, Guosheng, Wu, Ruiqi, Zhang, Shizhou, Wang, Peng, Zhang, Yanning

论文摘要

在当前的重新识别（REID）方法中，大多数域的概括都集中在处理域之间的样式差异，而在很大程度上忽略了不可预测的相机视图变化，我们将其确定为导致REID方法泛化的另一个主要因素。为了解决观点的改变，这项工作建议使用3D密集的姿势估计模型和纹理映射模块将行人图像映射到规范视图图像。由于纹理映射模块的不完美，规范视图图像可能会失去原始图像中的歧视性细节线索，因此直接将其用于REID将不可避免地导致性能差。为了处理此问题，我们建议通过基于变压器的模块融合原始图像和规范视图图像。该设计的关键见解是，变压器中的跨注意机制可能是将原始图像中的判别性纹理线索与规范视图图像保持一致的理想解决方案，该图像可以补偿规范视图图像的低质量纹理信息。通过广泛的实验，我们表明我们的方法可以在各种评估设置中带来优于现有方法的卓越性能。

In the current person Re-identification (ReID) methods, most domain generalization works focus on dealing with style differences between domains while largely ignoring unpredictable camera view change, which we identify as another major factor leading to a poor generalization of ReID methods. To tackle the viewpoint change, this work proposes to use a 3D dense pose estimation model and a texture mapping module to map the pedestrian images to canonical view images. Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images, and thus directly using them for ReID will inevitably result in poor performance. To handle this issue, we propose to fuse the original image and canonical view image via a transformer-based module. The key insight of this design is that the cross-attention mechanism in the transformer could be an ideal solution to align the discriminative texture clues from the original image with the canonical view image, which could compensate for the low-quality texture information of the canonical view image. Through extensive experiments, we show that our method can lead to superior performance over the existing approaches in various evaluation settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题