用于改善重播攻击检测的多任务暹罗神经网络

论文标题

用于改善重播攻击检测的多任务暹罗神经网络

Multi-Task Siamese Neural Network for Improving Replay Attack Detection

论文作者

von Platen, Patrick, Tao, Fei, Tur, Gokhan

论文摘要

自动扬声器验证系统容易受到音频重播攻击的影响，这些攻击通过重播授权扬声器的录音来绕过安全性。基于残留神经网络（RESNET）构建的重播攻击检测系统（RA）检测系统在公共基准ASVSPOOF 2019物理访问挑战上取得了惊人的结果。在大多数使用微调功能提取管道和模型体系结构的团队中，这种系统的普遍性仍然值得怀疑。在这项工作中，我们分析了多任务学习（MTL）设置中判别特征学习的影响，可以对RA检测系统的普遍性和可区分性具有。我们使用跨凝结标准优化的流行重新结构体系结构作为我们的基线，并将其与使用暹罗神经网络（SNN）优化的MTL优化的相同体系结构进行比较。可以证明，SNN的表现优于基线，相对26.8％的误差率（EER）。我们进一步增强了模型的体系结构，并证明SNN具有额外的重建损失，从而产生了相对13.8％EER的另一种显着改善。

Automatic speaker verification systems are vulnerable to audio replay attacks which bypass security by replaying recordings of authorized speakers. Replay attack detection (RA) detection systems built upon Residual Neural Networks (ResNet)s have yielded astonishing results on the public benchmark ASVspoof 2019 Physical Access challenge. With most teams using fine-tuned feature extraction pipelines and model architectures, the generalizability of such systems remains questionable though. In this work, we analyse the effect of discriminative feature learning in a multi-task learning (MTL) setting can have on the generalizability and discriminability of RA detection systems. We use a popular ResNet architecture optimized by the cross-entropy criterion as our baseline and compare it to the same architecture optimized by MTL using Siamese Neural Networks (SNN). It can be shown that SNN outperform the baseline by relative 26.8 % Equal Error Rate (EER). We further enhance the model's architecture and demonstrate that SNN with additional reconstruction loss yield another significant improvement of relative 13.8 % EER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题