论文标题

MBW:野外的多视图引导

MBW: Multi-view Bootstrapping in the Wild

论文作者

Dabhi, Mosam, Wang, Chaoyang, Clifford, Tim, Jeni, Laszlo Attila, Fasel, Ian R., Lucey, Simon

论文摘要

在不受约束的环境中标记铰接物体具有多种应用,包括娱乐,神经科学,心理学,伦理学和许多医学领域。大的离线数据集并非所有人都存在,而是最常见的铰接对象类别(例如人类)。将这些地标在视频序列中标记的手工标记是一项费力的任务。学到的地标探测器可以提供帮助,但是只有几个示例接受培训时可能会出错。训练细粒探测器的多相机系统在检测此类错误方面表现出了巨大的希望,从而允许自我监督的解决方案,这些解决方案只需要一小部分视频序列即可手工标记。但是,该方法基于校准的摄像机和刚性几何形状,使其昂贵,难以管理,并且在实际情况下不切实际。在本文中,我们通过将非刚性的3D神经先验与深水流相结合,以从只有两个或三个未校准的手持式摄像头的视频中获得高保真地标估计来解决这些瓶颈。只有几个注释(代表帧的1-2%),我们能够产生与最先进的完全监督方法相当的2D结果,以及其他现有方法不可能的3D重建。我们在野外(MBW)方法中的多视图引导方法在标准的人类数据集以及老虎,猎豹,鱼,哥洛布斯猴子,黑猩猩和火烈鸟中表现出令人印象深刻的结果,并从动物园随意捕获的视频中。我们发布了MBW的代码库以及这个具有挑战性的动物园数据集,该数据集构成了尾端分布类别的图像框架及其相应的2D,3D标签,该标签由最少的人类干预产生。

Labeling articulated objects in unconstrained settings have a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors can help, but can be error-prone when trained from only a few examples. Multi-camera systems that train fine-grained detectors have shown significant promise in detecting such errors, allowing for self-supervised solutions that only need a small percentage of the video sequence to be hand-labeled. The approach, however, is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios. In this paper, we address these bottlenecks by combining a non-rigid 3D neural prior with deep flow to obtain high-fidelity landmark estimates from videos with only two or three uncalibrated, handheld cameras. With just a few annotations (representing 1-2% of the frames), we are able to produce 2D results comparable to state-of-the-art fully supervised methods, along with 3D reconstructions that are impossible with other existing approaches. Our Multi-view Bootstrapping in the Wild (MBW) approach demonstrates impressive results on standard human datasets, as well as tigers, cheetahs, fish, colobus monkeys, chimpanzees, and flamingos from videos captured casually in a zoo. We release the codebase for MBW as well as this challenging zoo dataset consisting image frames of tail-end distribution categories with their corresponding 2D, 3D labels generated from minimal human intervention.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源