视频SEMNET：内存启动视频语义网络

论文标题

视频SEMNET：内存启动视频语义网络

Video SemNet: Memory-Augmented Video Semantic Network

论文作者

Vijayaraghavan, Prashanth, Roy, Deb

论文摘要

故事是传达思想，经验，社会和文化价值观的一种非常引人注目的媒介。叙事是故事的特定表现，将其变成了观众的知识。在本文中，我们提出了一种机器学习方法，以弥合视觉介质的低级数据表示和语义方面之间的差距，以捕获电影中的叙事元素。我们提出了一个名为“视频SEMNET”的内存启动视频语义网络，以编码语义描述符并学习视频的嵌入。该模型采用两个主要组成部分：（i）一种神经语义学习者，该神经语义学习者学习语义描述符的潜在嵌入，（ii）保留并记住视频中特定语义模式的内存模块。我们评估了从模型的变体获得的两个任务的视频表示：（a）流派预测和（b）IMDB评级预测。我们证明我们的模型能够预测加权F-1分别为0.72和0.63的流派和IMDB评分。结果表明我们的模型的代表性以及这种表示衡量受众参与的能力。

Stories are a very compelling medium to convey ideas, experiences, social and cultural values. Narrative is a specific manifestation of the story that turns it into knowledge for the audience. In this paper, we propose a machine learning approach to capture the narrative elements in movies by bridging the gap between the low-level data representations and semantic aspects of the visual medium. We present a Memory-Augmented Video Semantic Network, called Video SemNet, to encode the semantic descriptors and learn an embedding for the video. The model employs two main components: (i) a neural semantic learner that learns latent embeddings of semantic descriptors and (ii) a memory module that retains and memorizes specific semantic patterns from the video. We evaluate the video representations obtained from variants of our model on two tasks: (a) genre prediction and (b) IMDB Rating prediction. We demonstrate that our model is able to predict genres and IMDB ratings with a weighted F-1 score of 0.72 and 0.63 respectively. The results are indicative of the representational power of our model and the ability of such representations to measure audience engagement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题