论文标题
在时空中对齐视频
Aligning Videos in Space and Time
论文作者
论文摘要
在本文中,我们专注于在视频中提取视觉对应的任务。鉴于动作班的查询视频剪辑,我们的目标是将其与时空中的培训视频保持一致。获得这样的细粒度对准任务的培训数据具有挑战性,而且通常是模棱两可的。因此,我们提出了一种新颖的对准程序,该程序通过交叉视频周期矛盾来学习在空间和时间上的对应关系。在培训期间,给定一对视频,我们通过在第二个视频中匹配第一个视频中的给定框架中连接贴片的周期。鼓励将重叠贴片连接在一起的循环得分高于连接非重叠贴片的周期。我们对Penn动作和浇注数据集的实验表明,所提出的方法可以成功地学习在视频中相应的语义上相似的贴片,并学习对对象和动作状态敏感的表示。
In this paper, we focus on the task of extracting visual correspondences across videos. Given a query video clip from an action class, we aim to align it with training videos in space and time. Obtaining training data for such a fine-grained alignment task is challenging and often ambiguous. Hence, we propose a novel alignment procedure that learns such correspondence in space and time via cross video cycle-consistency. During training, given a pair of videos, we compute cycles that connect patches in a given frame in the first video by matching through frames in the second video. Cycles that connect overlapping patches together are encouraged to score higher than cycles that connect non-overlapping patches. Our experiments on the Penn Action and Pouring datasets demonstrate that the proposed method can successfully learn to correspond semantically similar patches across videos, and learns representations that are sensitive to object and action states.