X3D：扩展体系结构以供有效的视频识别

论文标题

X3D：扩展体系结构以供有效的视频识别

X3D: Expanding Architectures for Efficient Video Recognition

论文作者

Feichtenhofer, Christoph

论文摘要

本文介绍了X3D，这是一个有效的视频网络家族，该家族逐渐在空间，时间，宽度和深度上逐渐扩展了一个微小的2D图像分类体系结构。受到机器学习中功能选择方法的启发，采用了一种简单的逐步网络扩展方法，该方法在每个步骤中都扩展了一个轴，从而实现了复杂性权衡的良好准确性。为了将X3D扩展到特定的目标复杂性，我们执行渐进式向前扩展，然后进行后退收缩。 X3D可以实现最先进的性能，同时需要4.8倍和5.5倍的多重添加和参数，以实现与以前的工作相似的准确性。我们最令人惊讶的发现是，具有高时空分辨率的网络可以表现良好，而在网络宽度和参数方面非常轻。我们报告了视频分类和检测基准的前所未有的效率的竞争精度。代码将提供：https：//github.com/facebookresearch/slowfast

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved. To expand X3D to a specific target complexity, we perform progressive forward expansion followed by backward contraction. X3D achieves state-of-the-art performance while requiring 4.8x and 5.5x fewer multiply-adds and parameters for similar accuracy as previous work. Our most surprising finding is that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters. We report competitive accuracy at unprecedented efficiency on video classification and detection benchmarks. Code will be available at: https://github.com/facebookresearch/SlowFast

下载PDF全文

下载文献需遵守相关版权规定

论文标题