论文标题
跨观景跨场跨场多视图人群计数
Cross-View Cross-Scene Multi-View Crowd Counting
论文作者
论文摘要
以前已提出多视图人群计数,以利用多摄像头扩展单个相机的视野,捕获现场更多的人,并提高被遮挡的人或低分辨率的人的计数绩效。但是,当前的多视图范式在相同的单个场景和摄像头视图上训练和测试,这限制了其实际应用。在本文中,我们提出了一个跨视图跨场景(CVC)多视图人群计数范式,其中培训和测试在不同的相机布局的不同场景上发生。为了动态处理场景下的最佳视图融合的挑战,由于摄像机校准误差或错误的功能而引起的摄像机布局变化和无通话噪声,我们提出了一个CVCS模型,该模型专心地选择并使用摄像机布局的几何学选择并将多个视图融合在一起,并使用噪声视图正规化方法来训练模型来处理非功能相关错误。我们还生成了一个大型合成的多摄像机人群来计数数据集,并具有大量场景和相机视图,以捕获许多可能的变化,从而避免了收集和注释如此大的真实数据集的困难。然后,我们通过使用无监督的域转移来测试训练有素的CVCS模型,以实现真实的多视图计数数据集。对合成数据训练的拟议的CVCS模型优于仅在实际数据上训练的相同模型,并且与在同一单个场景上训练和测试的完全监督的方法相比,实现了有希望的性能。
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera, capturing more people in the scene, and improve counting performance for occluded people or those in low resolution. However, the current multi-view paradigm trains and tests on the same single scene and camera-views, which limits its practical application. In this paper, we propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts. To dynamically handle the challenge of optimal view fusion under scene and camera layout change and non-correspondence noise due to camera calibration errors or erroneous features, we propose a CVCS model that attentively selects and fuses multiple views together using camera layout geometry, and a noise view regularization method to train the model to handle non-correspondence errors. We also generate a large synthetic multi-camera crowd counting dataset with a large number of scenes and camera views to capture many possible variations, which avoids the difficulty of collecting and annotating such a large real dataset. We then test our trained CVCS model on real multi-view counting datasets, by using unsupervised domain transfer. The proposed CVCS model trained on synthetic data outperforms the same model trained only on real data, and achieves promising performance compared to fully supervised methods that train and test on the same single scene.