论文标题
3D人群通过几何注意引导的多视图融合计数
3D Crowd Counting via Geometric Attention-guided Multi-View Fusion
论文作者
论文摘要
最近,已经提出了使用深层神经网络进行多视图人群计数,以便在大型和广阔的场景中使用多个摄像机进行计数。当前方法将相机视图投影到3D世界的平均高度平面上,然后融合预计的多视图功能,以预测地面上的2D场景级密度映射(即鸟眼视图)。与先前的研究不同,我们考虑了3D世界中人们的可变高度,并建议通过3D功能融合和3D场景级密度映射来解决多视图人群计数任务,而不是地面平面上的2D密度映射。与2D融合相比,3D融合提取了沿z维数(高度)的更多人的信息,这有助于解决跨多个视图的规模变化。 3D密度地图仍然保留了总和是计数的2D密度图属性,同时还提供了有关人群密度的3D信息。此外,我们不使用标准方法将特征沿2到3D投影中的视图射线复制,而是基于高度估计网络提出了一个注意模块,该模块迫使每个2D像素沿视图射线投影到一个3D体素。我们还探讨了2D观点中3D预测和基础真理之间的预测一致性,以进一步提高计数性能。对所提出的方法进行了对合成和现实世界的多次计数数据集的测试,并与最先进的表现更好或可比较的计数性能。
Recently multi-view crowd counting using deep neural networks has been proposed to enable counting in large and wide scenes using multiple cameras. The current methods project the camera-view features to the average-height plane of the 3D world, and then fuse the projected multi-view features to predict a 2D scene-level density map on the ground (i.e., birds-eye view). Unlike the previous research, we consider the variable height of the people in the 3D world and propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D density map on the ground plane. Compared to 2D fusion, the 3D fusion extracts more information of the people along the z-dimension (height), which helps to address the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. Furthermore, instead of using the standard method of copying the features along the view ray in the 2D-to-3D projection, we propose an attention module based on a height estimation network, which forces each 2D pixel to be projected to one 3D voxel along the view ray. We also explore the projection consistency among the 3D prediction and the ground truth in the 2D views to further enhance the counting performance. The proposed method is tested on the synthetic and real-world multiview counting datasets and achieves better or comparable counting performance to the state-of-the-art.