对深扬声器嵌入的特征提取器的比较重新评估

论文标题

对深扬声器嵌入的特征提取器的比较重新评估

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

论文作者

Liu, Xuechen, Sahidullah, Md, Kinnunen, Tomi

论文摘要

现代自动扬声器验证在很大程度上依赖于接受MEL频率Cepstral系数（MFCC）功能的深神经网络（DNN）。尽管有基于阶段，韵律和长期时间操作的替代特征提取方法，但尚未通过基于DNN的方法进行广泛研究。我们的目标是通过在Voxceleb和SITW数据集上进行大量重新评估14个功能提取器来填补这一空白。我们的发现表明，配备了诸如光谱质心，组延迟功能和集成噪声等技术的功能为深扬声器嵌入式提取提供了有希望的MFCC替代方案。实验结果表明，高达16.3 \％（voxceleb）和25.1 \％（SITW）的相对误差率（EER）相对降低。

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3\% (VoxCeleb) and 25.1\% (SITW) relative decrease in equal error rate (EER) to the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题