论文标题
基于注意力的唇彩视听综合,用于野外说话的面部发电
Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild
论文作者
论文摘要
在最近的视听研究中,具有极大意义的说话面部产生引起了更多的关注。如何实现准确的嘴唇同步是要进一步研究的长期挑战。在本文中,由XXX促使ATTNWAV2LIP模型是通过将空间注意模块和通道注意模块纳入唇部同步策略而提出的。拟议的attnwav2lip模型不是专注于面部图像的不重要区域,而是能够更加关注唇部区域的重建。据我们有限的知识,这是将注意力机制引入说话面部生成方案的首次尝试。已经进行了广泛的实验来评估所提出模型的有效性。与通过LSE-D和LSE-C度量测量的基线相比,在包括LRW,LRS2和LRS3在内的基准唇合成数据集中证明了卓越的性能。
Talking face generation with great practical significance has attracted more attention in recent audio-visual studies. How to achieve accurate lip synchronization is a long-standing challenge to be further investigated. Motivated by xxx, in this paper, an AttnWav2Lip model is proposed by incorporating spatial attention module and channel attention module into lip-syncing strategy. Rather than focusing on the unimportant regions of the face image, the proposed AttnWav2Lip model is able to pay more attention on the lip region reconstruction. To our limited knowledge, this is the first attempt to introduce attention mechanism to the scheme of talking face generation. An extensive experiments have been conducted to evaluate the effectiveness of the proposed model. Compared to the baseline measured by LSE-D and LSE-C metrics, a superior performance has been demonstrated on the benchmark lip synthesis datasets, including LRW, LRS2 and LRS3.