论文标题

音频驱动的共同语音示意视频生成

Audio-Driven Co-Speech Gesture Video Generation

论文作者

Liu, Xian, Wu, Qianyi, Zhou, Hang, Du, Yuanqi, Wu, Wayne, Lin, Dahua, Liu, Ziwei

论文摘要

共同语音的手势对于人机互动和数字娱乐至关重要。虽然先前的作品主要是将语音音频映射到人类骨骼(例如2D关键点),但直接在图像域中生成扬声器的手势仍未解决。在这项工作中,我们正式定义和研究了音频驱动的共同语音示意视频生成的挑战性问题,即使用统一的框架来生成由语音音频驱动的扬声器图像序列。我们的关键见解是,共同语音的手势可以分解为常见的运动模式和微妙的节奏动力学。为此,我们提出了一个新颖的框架,音频驱动的手势视频生成(Angie),以有效地捕获可重复使用的共同语音手势模式以及细粒度的节奏动作。为了实现高保真图像序列的产生,我们利用无监督的运动表示,而不是先验的结构人体(例如2D骨架)。具体而言,1)我们提出了一个量化的运动提取器(VQ-MOTION提取器),以总结从隐式运动表示到代码簿的常见共同语音手势模式。 2)此外,设计了带有运动改进的共同语音GPT(共同语音GPT),以补充微妙的韵律运动细节。广泛的实验表明,我们的框架呈现出现实和生动的共同语音手势视频。演示视频和更多资源可以找到:https://alvinliu0.github.io/projects/angie

Co-speech gesture is crucial for human-machine interaction and digital entertainment. While previous works mostly map speech audio to human skeletons (e.g., 2D keypoints), directly generating speakers' gestures in the image domain remains unsolved. In this work, we formally define and study this challenging problem of audio-driven co-speech gesture video generation, i.e., using a unified framework to generate speaker image sequence driven by speech audio. Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics. To this end, we propose a novel framework, Audio-driveN Gesture vIdeo gEneration (ANGIE), to effectively capture the reusable co-speech gesture patterns as well as fine-grained rhythmic movements. To achieve high-fidelity image sequence generation, we leverage an unsupervised motion representation instead of a structural human body prior (e.g., 2D skeletons). Specifically, 1) we propose a vector quantized motion extractor (VQ-Motion Extractor) to summarize common co-speech gesture patterns from implicit motion representation to codebooks. 2) Moreover, a co-speech gesture GPT with motion refinement (Co-Speech GPT) is devised to complement the subtle prosodic motion details. Extensive experiments demonstrate that our framework renders realistic and vivid co-speech gesture video. Demo video and more resources can be found in: https://alvinliu0.github.io/projects/ANGIE

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源