无监督的对比度学习声音事件表示形式

论文标题

无监督的对比度学习声音事件表示形式

Unsupervised Contrastive Learning of Sound Event Representations

论文作者

Fonseca, Eduardo, Ortego, Diego, McGuinness, Kevin, O'Connor, Noel E., Serra, Xavier

论文摘要

自我监督的表示学习可以减轻识别任务中的限制，但很少有手动标记的数据，但无标记的数据 - - 声音事件研究中的常见情况。在这项工作中，我们探讨了无监督的对比学习，以此作为学习声音事件表示形式的一种方式。为此，我们建议使用借口来对比，对声音事件的不同增强观点进行对比。这些视图主要是通过将训练示例与无关背景的混合进行计算，然后是其他数据增强。我们通过消融实验分析了方法的主要组成部分。我们使用线性评估评估学习的表示形式，并在两个域内的声音事件分类任务中，即使用有限的手动标记数据，并使用嘈杂的标记数据来评估。我们的结果表明，无监督的对比预训练可以减轻数据稀缺性的影响，并增加对嘈杂标签的鲁棒性，表现优于监督的基线。

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised contrastive learning as a way to learn sound event representations. To this end, we propose to use the pretext task of contrasting differently augmented views of sound events. The views are computed primarily via mixing of training examples with unrelated backgrounds, followed by other data augmentations. We analyze the main components of our method via ablation experiments. We evaluate the learned representations using linear evaluation, and in two in-domain downstream sound event classification tasks, namely, using limited manually labeled data, and using noisy labeled data. Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels, outperforming supervised baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题