电视节目多相机编辑的时间和上下文变压器

论文标题

电视节目多相机编辑的时间和上下文变压器

Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows

论文作者

Rao, Anyi, Jiang, Xuekun, Wang, Sichen, Guo, Yuwei, Liu, Zihao, Dai, Bo, Pang, Long, Wu, Xiaoyu, Lin, Dahua, Jin, Libiao

论文摘要

在多个相机之间选择合适的相机视图的能力在电视节目交付中起着至关重要的作用。但是由于缺乏高质量的培训数据，很难弄清楚统计模式并应用智能处理。为了解决这个问题，我们首先在此环境上收集了一个小说的基准，其中包括四种不同的场景，包括音乐会，体育游戏，盛大表演和比赛，每个场景都包含6个由不同摄像机录制的同步曲目。它包含88小时的原始视频，这些视频有助于14小时编辑的视频。基于此基准，我们进一步提出了一种新方法的时间和上下文变压器，该方法利用历史镜头和其他观点中的线索来做出射击过渡决策并预测要使用的观点。广泛的实验表明，我们的方法在提议的多相机编辑基准上优于现有方法。

The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery. But it is hard to figure out the statistical pattern and apply intelligent processing due to the lack of high-quality training data. To solve this issue, we first collect a novel benchmark on this setting with four diverse scenarios including concerts, sports games, gala shows, and contests, where each scenario contains 6 synchronized tracks recorded by different cameras. It contains 88-hour raw videos that contribute to the 14-hour edited videos. Based on this benchmark, we further propose a new approach temporal and contextual transformer that utilizes clues from historical shots and other views to make shot transition decisions and predict which view to be used. Extensive experiments show that our method outperforms existing methods on the proposed multi-camera editing benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题