用于扬声器验证的大规模嵌入式嵌入：余弦还是PLDA？

论文标题

用于扬声器验证的大规模嵌入式嵌入：余弦还是PLDA？

Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

论文作者

Wang, Qiongqiong, Lee, Kong Aik, Liu, Tianchi

论文摘要

在训练深层扬声器中，嵌入神经网络的大幅度软性软磁体跨度损失的出现触发了从参数后端逐渐转移到更简单的余弦相似性措施以进行扬声器验证。流行的参数后端包括概率线性判别分析（PLDA）及其变体。本文调查了基于边距的跨凝结损失的特性，导致这种转变，并旨在找到最适合发言人验证的得分后端。此外，我们重新审视了过去广泛使用的预处理技术，并评估了它们对大规模嵌入的有效性。在最先进的ECAPA-TDNN网络上进行的实验，接受了各种大型毛线软磁性跨透明损失训练，表明言论内的紧凑性的增量很大，这使得常规的PLDA多余。在这方面，我们发现限制内部言论者协方差矩阵可以提高PLDA的性能。通过在Voxceleb-1和SITW核心核心测试集上进行的一系列实验证明了这一点，其降低40.8％，最小检测成本（MINDCF）降低了35.1％。它还表现优于余弦的评分，而EER和MindCF的降低分别降低了10.9％和4.9％。

The emergence of large-margin softmax cross-entropy losses in training deep speaker embedding neural networks has triggered a gradual shift from parametric back-ends to a simpler cosine similarity measure for speaker verification. Popular parametric back-ends include the probabilistic linear discriminant analysis (PLDA) and its variants. This paper investigates the properties of margin-based cross-entropy losses leading to such a shift and aims to find scoring back-ends best suited for speaker verification. In addition, we revisit the pre-processing techniques which have been widely used in the past and assess their effectiveness on large-margin embeddings. Experiments on the state-of-the-art ECAPA-TDNN networks trained with various large-margin softmax cross-entropy losses show a substantial increment in intra-speaker compactness making the conventional PLDA superfluous. In this regard, we found that constraining the within-speaker covariance matrix could improve the performance of the PLDA. It is demonstrated through a series of experiments on the VoxCeleb-1 and SITW core-core test sets with 40.8% equal error rate (EER) reduction and 35.1% minimum detection cost (minDCF) reduction. It also outperforms cosine scoring consistently with reductions in EER and minDCF by 10.9% and 4.9%, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题