论文标题

一个迭代的框架,用于自我监督的深度说话者表示学习

An iterative framework for self-supervised deep speaker representation learning

论文作者

Cai, Danwei, Wang, Weiqing, Li, Ming

论文摘要

在本文中,我们提出了一个基于深神经网络(DNN)的自我监督的说话者表示学习的迭代框架。该框架首先要通过对比损失来最大化话语中的不同细分之间的一致性来训练自我实施者嵌入网络。利用DNN从标签噪声中学习的能力,我们建议将从上一个扬声器网络获得的扬声器嵌入聚类,并将后续类分配用作伪标签来训练新的DNN。此外,我们迭代地训练扬声器网络,该扬声器网络具有从上一步生成的伪标签,以引导DNN的判别能力。扬声器验证实验是在Voxceleb数据集上进行的。结果表明,我们提出的迭代自我监督学习框架的表现优于先前的作品。 5次迭代后的扬声器网络比嵌入具有对比性损失的扬声器模型获得了61%的性能增长。

In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing agreement between different segments within an utterance via a contrastive loss. Taking advantage of DNN's ability to learn from data with label noise, we propose to cluster the speaker embedding obtained from the previous speaker network and use the subsequent class assignments as pseudo labels to train a new DNN. Moreover, we iteratively train the speaker network with pseudo labels generated from the previous step to bootstrap the discriminative power of a DNN. Speaker verification experiments are conducted on the VoxCeleb dataset. The results show that our proposed iterative self-supervised learning framework outperformed previous works using self-supervision. The speaker network after 5 iterations obtains a 61% performance gain over the speaker embedding model trained with contrastive loss.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源