识别违反不变质地以进行稳健的深膜检测

论文标题

识别违反不变质地以进行稳健的深膜检测

Identifying Invariant Texture Violation for Robust Deepfake Detection

论文作者

Sun, Xinwei, Wu, Botong, Chen, Wei

论文摘要

现有的DeepFake检测方法通过访问已发布的大规模数据集，报告了有希望的分布结果。但是，由于非平滑合成方法，该数据集中的假样品可能会暴露出明显的伪影（例如，鲜明的视觉对比度，非平滑边界），这在上面的大多数帧级检测方法中都严重依赖。由于这些文物不会在真实的媒体伪造中出现，因此，当应用于接近现实的假图像时，上述方法可能会遭受巨大的退化。为了提高高现实主义伪造数据的鲁棒性，我们提出了不变的纹理学习（Intele）框架，该框架仅访问以低视觉质量访问已发布的数据集。我们的方法基于先验，即源面部的显微镜面部纹理不可避免地因从目标人转移而来的纹理侵犯，因此可以将其视为所有假图像中共享的不变特征。为了学习这种深泡检测的不变性，我们的Intele引入了一个自动编码器框架，该框架带有不同的原始图像和假图像解码器，这些框架进一步附加了浅层分类器，以分离明显的人工效应。配备了这种分离的编码器提取的嵌入可以在假图像中捕获纹理违规，然后是最终原始/假预测的分类器。作为理论保证，我们证明了这种不变性纹理违规的可识别性，即从观察数据中准确推断出来。通过有希望的概括能力从具有明显伪影的低品质图像到具有高现实主义的假图像，我们的方法的有效性和实用性可以证明。

Existing deepfake detection methods have reported promising in-distribution results, by accessing published large-scale dataset. However, due to the non-smooth synthesis method, the fake samples in this dataset may expose obvious artifacts (e.g., stark visual contrast, non-smooth boundary), which were heavily relied on by most of the frame-level detection methods above. As these artifacts do not come up in real media forgeries, the above methods can suffer from a large degradation when applied to fake images that close to reality. To improve the robustness for high-realism fake data, we propose the Invariant Texture Learning (InTeLe) framework, which only accesses the published dataset with low visual quality. Our method is based on the prior that the microscopic facial texture of the source face is inevitably violated by the texture transferred from the target person, which can hence be regarded as the invariant characterization shared among all fake images. To learn such an invariance for deepfake detection, our InTeLe introduces an auto-encoder framework with different decoders for pristine and fake images, which are further appended with a shallow classifier in order to separate out the obvious artifact-effect. Equipped with such a separation, the extracted embedding by encoder can capture the texture violation in fake images, followed by the classifier for the final pristine/fake prediction. As a theoretical guarantee, we prove the identifiability of such an invariance texture violation, i.e., to be precisely inferred from observational data. The effectiveness and utility of our method are demonstrated by promising generalization ability from low-quality images with obvious artifacts to fake images with high realism.

下载PDF全文

下载文献需遵守相关版权规定

论文标题