论文标题
知觉:无监督子词建模的人类语音感知基准
Perceptimatic: A human speech perception benchmark for unsupervised subword modelling
论文作者
论文摘要
在本文中,我们提出了一个数据集和方法,以在电话歧视任务上比较语音处理模型和人类行为。我们提供感知的,这是一个由法语和英语语音刺激组成的开放数据集,以及91英语和93位讲法语的听众的结果。刺激测试了广泛的法语和英语对比,并直接从天然运行读取语音的Corpora中提取,该语音用于2017年零资源演讲挑战。我们提供了一种将人类的感知空间与模型的代表空间进行比较的方法,并将其应用于先前提交挑战的模型。我们表明,与无监督的模型和监督的多语言模型不同,标准监督的单语HMM-GMM电话识别系统虽然善于区分手机,但产生了与人类本地听众的代表性空间。
In this paper, we present a data set and methods to compare speech processing models and human behaviour on a phone discrimination task. We provide Perceptimatic, an open data set which consists of French and English speech stimuli, as well as the results of 91 English- and 93 French-speaking listeners. The stimuli test a wide range of French and English contrasts, and are extracted directly from corpora of natural running read speech, used for the 2017 Zero Resource Speech Challenge. We provide a method to compare humans' perceptual space with models' representational space, and we apply it to models previously submitted to the Challenge. We show that, unlike unsupervised models and supervised multilingual models, a standard supervised monolingual HMM-GMM phone recognition system, while good at discriminating phones, yields a representational space very different from that of human native listeners.