基于主题中心镜头的统一射击类型分类框架

论文标题

基于主题中心镜头的统一射击类型分类框架

A Unified Framework for Shot Type Classification Based on Subject Centric Lens

论文作者

Rao, Anyi, Wang, Jiaze, Xu, Linning, Jiang, Xuekun, Huang, Qingqiu, Zhou, Bolei, Lin, Dahua

论文摘要

镜头是各种视频的关键叙事元素，例如电影，电视连续剧和用户生成的视频在互联网上蓬勃发展。镜头的类型极大地影响了基本思想，情感和信息的表达方式。分析射击类型的技术对于对视频的理解很重要，该视频在这个时代的现实应用程序中的需求增加了。由于视频内容以外所需的其他信息，例如框架和相机运动的空间组成，因此对射击类型进行了分类具有挑战性。为了解决这些问题，我们提出了一个学习框架主题指导网络（SGNET），以供射击类型识别。 SGNET将镜头的主题和背景分为两个流，分别用作规模和运动类型分类的独立指导图。为了促进镜头类型分析和模型评估，我们构建了一个大规模的数据集电影，其中包含来自7K电影预告片的46K镜头，并带有其规模和动作类型的注释。实验表明，我们的框架能够准确地识别这两个镜头的属性，从而超过了所有先前的方法。

Shots are key narrative elements of various videos, e.g. movies, TV series, and user-generated videos that are thriving over the Internet. The types of shots greatly influence how the underlying ideas, emotions, and messages are expressed. The technique to analyze shot types is important to the understanding of videos, which has seen increasing demand in real-world applications in this era. Classifying shot type is challenging due to the additional information required beyond the video content, such as the spatial composition of a frame and camera movement. To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition. SGNet separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively. To facilitate shot type analysis and model evaluations, we build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types. Experiments show that our framework is able to recognize these two attributes of shot accurately, outperforming all the previous methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题