Star-GNN：基于内容检索的时空视频表示

论文标题

Star-GNN：基于内容检索的时空视频表示

STAR-GNN: Spatial-Temporal Video Representation for Content-based Retrieval

论文作者

Zhao, Guoping, Zhang, Bingqing, Zhang, Mingyu, Li, Yaxian, Liu, Jiajun, Wen, Ji-Rong

论文摘要

我们提出了一个名为star-gnn的视频特征表示框架，该框架在多尺度晶格功能图上应用了可插入图的神经网络组件。 STAR-GNN的本质是利用时间动力学和空间内容以及框架不同尺度区域之间的视觉连接。它对带有晶格特征图的视频进行建模，其中节点代表不同粒度的区域，其加权边缘代表空间和时间链接。上下文节点通过图形神经网络同时汇总，并具有训练有检索三重损失的参数。在实验中，我们表明Star-GNN有效地实现了视频框架序列上的动态注意机制，从而强调了视频中动态和语义丰富的内容，并且对噪声和冗余是强大的。经验结果表明，STAR-GNN可实现基于内容的视频检索的最新性能。

We propose a video feature representation learning framework called STAR-GNN, which applies a pluggable graph neural network component on a multi-scale lattice feature graph. The essence of STAR-GNN is to exploit both the temporal dynamics and spatial contents as well as visual connections between regions at different scales in the frames. It models a video with a lattice feature graph in which the nodes represent regions of different granularity, with weighted edges that represent the spatial and temporal links. The contextual nodes are aggregated simultaneously by graph neural networks with parameters trained with retrieval triplet loss. In the experiments, we show that STAR-GNN effectively implements a dynamic attention mechanism on video frame sequences, resulting in the emphasis for dynamic and semantically rich content in the video, and is robust to noise and redundancies. Empirical results show that STAR-GNN achieves state-of-the-art performance for Content-Based Video Retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题