FLAG3D：带有语言指令的3D健身活动数据集

论文标题

FLAG3D：带有语言指令的3D健身活动数据集

FLAG3D: A 3D Fitness Activity Dataset with Language Instruction

论文作者

Tang, Yansong, Liu, Jinpeng, Liu, Aoyang, Yang, Bin, Dai, Wenxun, Rao, Yongming, Lu, Jiwen, Zhou, Jie, Li, Xiu

论文摘要

随着世界各地不断繁荣的流行，健身活动分析已成为计算机视觉中新兴的研究主题。尽管最近提出了各种新任务和算法，但越来越多的渴望涉及高质量数据，精细粒度标签和不同环境的数据资源。在本文中，我们提出了FLAG3D，这是一个大规模的3D健身活动数据集，其中包含60个类别的180K序列的语言指令。 FLAG3D具有以下三个方面：1）从高级MOCAP系统中捕获的精确而密集的3D人姿势，以处理复杂的活动和大型运动，2）详细和专业语言教学，以描述如何执行特定活动，3）高科技MoCap系统，呈现Mocap系统，呈现软件，呈现软件以及自然环境中的成本效益智能器的多功能视频资源。广泛的实验和深入的分析表明，FLAG3D为各种挑战（例如跨域的人类行动识别，动态的人类网格恢复和语言引导的人类行动产生）贡献了巨大的研究价值。我们的数据集和源代码可在https://andytang15.github.io/flag3d上公开获得。

With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision. While a variety of new tasks and algorithms have been proposed recently, there are growing hunger for data resources involved in high-quality data, fine-grained labels, and diverse environments. In this paper, we present FLAG3D, a large-scale 3D fitness activity dataset with language instruction containing 180K sequences of 60 categories. FLAG3D features the following three aspects: 1) accurate and dense 3D human pose captured from advanced MoCap system to handle the complex activity and large movement, 2) detailed and professional language instruction to describe how to perform a specific activity, 3) versatile video resources from a high-tech MoCap system, rendering software, and cost-effective smartphones in natural environments. Extensive experiments and in-depth analysis show that FLAG3D contributes great research value for various challenges, such as cross-domain human action recognition, dynamic human mesh recovery, and language-guided human action generation. Our dataset and source code are publicly available at https://andytang15.github.io/FLAG3D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题