MEVA：用于活动检测的大规模多视图，多模式视频数据集

论文标题

MEVA：用于活动检测的大规模多视图，多模式视频数据集

MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection

论文作者

Corona, Kellie, Osterdahl, Katie, Collins, Roderic, Hoogs, Anthony

论文摘要

我们介绍了具有Active（MEVA）数据集的Multiview扩展视频，该数据集是一种用于人类活动识别的新的且非常大的尺度数据集。现有的安全数据集要么通过汇总公共视频由于其内容而散布的活动计数，该视频通常不包括相同的背景视频，要么通过观察公共区域来实现持久性，因此无法控制活动内容。我们的数据集超过9300小时的未修剪，连续的视频，脚本为包括多样化的同时活动以及自发的背景活动。我们已经注释了37种活动类型的144小时，标记了演员和道具的边界框。我们的收藏集观察了大约100名演员在三周的时间内在访问控制的角度进行脚本脚本的场景和自发背景活动，以多种方式收集，并具有重叠和非重叠的室内和室外视图。最终的数据包括来自38 RGB和热IR摄像机的视频，42小时的无人机镜头以及演员的GPS位置。 122小时的注释被隔离以支持扩展视频（ACTEV）挑战中的NIST活动；其他22个小时的注释和相应的视频可在我们的网站上获得，再加上306小时的地面摄像头数据，4.6小时的无人机数据和9.6小时的GPS日志。其他派生的数据包括摄像机模型，室外摄像机和室外场景的密集的3D点云模型。数据是通过IRB监督和批准收集的，并根据CC-BY-4.0许可发布。

We present the Multiview Extended Video with Activities (MEVA) dataset, a new and very-large-scale dataset for human activity recognition. Existing security datasets either focus on activity counts by aggregating public video disseminated due to its content, which typically excludes same-scene background video, or they achieve persistence by observing public areas and thus cannot control for activity content. Our dataset is over 9300 hours of untrimmed, continuous video, scripted to include diverse, simultaneous activities, along with spontaneous background activity. We have annotated 144 hours for 37 activity types, marking bounding boxes of actors and props. Our collection observed approximately 100 actors performing scripted scenarios and spontaneous background activity over a three-week period at an access-controlled venue, collecting in multiple modalities with overlapping and non-overlapping indoor and outdoor viewpoints. The resulting data includes video from 38 RGB and thermal IR cameras, 42 hours of UAV footage, as well as GPS locations for the actors. 122 hours of annotation are sequestered in support of the NIST Activity in Extended Video (ActEV) challenge; the other 22 hours of annotation and the corresponding video are available on our website, along with an additional 306 hours of ground camera data, 4.6 hours of UAV data, and 9.6 hours of GPS logs. Additional derived data includes camera models geo-registering the outdoor cameras and a dense 3D point cloud model of the outdoor scene. The data was collected with IRB oversight and approval and released under a CC-BY-4.0 license.

下载PDF全文

下载文献需遵守相关版权规定

论文标题