零资源语音基准2021：无监督语言建模的指标和基准

论文标题

零资源语音基准2021：无监督语言建模的指标和基准

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

论文作者

Nguyen, Tu Anh, de Seyssel, Maureen, Rozé, Patricia, Rivière, Morgane, Kharitonov, Evgeny, Baevski, Alexei, Dunbar, Ewan, Dupoux, Emmanuel

论文摘要

我们介绍了一项新的无监督任务，口语建模：从没有任何标签的原始音频信号中学习语言表示形式，以及零资源语音语音基准2021：一个4个黑色盒子的套件，由4个Black-oble-shot Metrics组成的套件，探测4个语言级别的学习模型的质量：语音级别：语音，词典，词典，词典，语义和语义。我们介绍了由三个无监督系统串联组成的复合基线的结果和分析：自我监督的对比表示学习（CPC），聚类（K-Means）和语言建模（LSTM或BERT）。语言模型是根据从集群学习表示的基础上学习的。这条简单的管道在所有四个指标上显示出比机会性能更好的表现，这证明了来自原始语音的口语建模的可行性。与基于文本的“ Topline”系统相比，它的性能也更差，该系统在相同的数据上训练，描绘了更复杂的端到端模型探索的空间。

We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics. We present the results and analyses of a composite baseline made of the concatenation of three unsupervised systems: self-supervised contrastive representation learning (CPC), clustering (k-means) and language modeling (LSTM or BERT). The language models learn on the basis of the pseudo-text derived from clustering the learned representations. This simple pipeline shows better than chance performance on all four metrics, demonstrating the feasibility of spoken language modeling from raw speech. It also yields worse performance compared to text-based 'topline' systems trained on the same data, delineating the space to be explored by more sophisticated end-to-end models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题