Sporthesia：使用自然语言增强体育视频

论文标题

Sporthesia：使用自然语言增强体育视频

Sporthesia: Augmenting Sports Videos Using Natural Language

论文作者

Zhu-Tian, Chen, Yang, Qisen, Xie, Xiao, Beyer, Johanna, Xia, Haijun, Wu, Yingcai, Pfister, Hanspeter

论文摘要

增强体育视频结合了可视化和视频效果以在实际场景中介绍数据，可以诱人地传达见解，因此对于世界各地的体育爱好者来说越来越受欢迎。但是，创建增强体育视频仍然是一项具有挑战性的任务，需要大量的时间和视频编辑技能。另一方面，体育见解通常是使用自然语言（例如评论，口头演示和文章）传达的，但通常缺乏视觉提示。因此，这项工作旨在通过使分析师能够使用以自然语言表达的见解直接创建视频中嵌入的可视化效果来促进增强体育视频的创建。为了实现这一目标，我们提出了一种三步方法 - 1）检测文本中的可视化实体，2）将这些实体映射到可视化中，以及3）安排这些可视化来播放视频 - 并分析了155个体育视频剪辑和随附的注释，以完成这些步骤。通过我们的分析，我们设计并实施了Sporthesia，这是一种概念验证系统，将基于球拍的体育视频和文字评论作为输入和输出作为增强视频。我们演示了Sporthesia在两个示例场景中的适用性，即使用文字创作增强体育视频，并根据听觉评论增强历史体育视频。技术评估表明，Sporthesia在检测文本中可视化实体时达到了高精度（F1得分为0.9）。与八位体育分析师一起评估的专家评估表明，我们的语言驱动的创作方法对高实用性，有效性和满意度，并为未来的改进和机会提供了见解。

Augmented sports videos, which combine visualizations and video effects to present data in actual scenes, can communicate insights engagingly and thus have been increasingly popular for sports enthusiasts around the world. Yet, creating augmented sports videos remains a challenging task, requiring considerable time and video editing skills. On the other hand, sports insights are often communicated using natural language, such as in commentaries, oral presentations, and articles, but usually lack visual cues. Thus, this work aims to facilitate the creation of augmented sports videos by enabling analysts to directly create visualizations embedded in videos using insights expressed in natural language. To achieve this goal, we propose a three-step approach - 1) detecting visualizable entities in the text, 2) mapping these entities into visualizations, and 3) scheduling these visualizations to play with the video - and analyzed 155 sports video clips and the accompanying commentaries for accomplishing these steps. Informed by our analysis, we have designed and implemented Sporthesia, a proof-of-concept system that takes racket-based sports videos and textual commentaries as the input and outputs augmented videos. We demonstrate Sporthesia's applicability in two exemplar scenarios, i.e., authoring augmented sports videos using text and augmenting historical sports videos based on auditory comments. A technical evaluation shows that Sporthesia achieves high accuracy (F1-score of 0.9) in detecting visualizable entities in the text. An expert evaluation with eight sports analysts suggests high utility, effectiveness, and satisfaction with our language-driven authoring method and provides insights for future improvement and opportunities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题