解开自动病理可理解性评估的潜在语音表示

论文标题

解开自动病理可理解性评估的潜在语音表示

Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

论文作者

Weise, Tobias, Klumpp, Philipp, Demir, Kubilay Can, Maier, Andreas, Noeth, Elmar, Heismann, Bjoern, Schuster, Maria, Yang, Seung Hee

论文摘要

语音清晰度评估在患有病理语音疾病的患者的治疗中起着重要作用。需要自动和客观的措施，以帮助治疗师进行传统的主观和劳动密集型评估。在这项工作中，我们研究了一种新的方法，该方法是使用从健康参考和病理扬声器获得的平行话语对的分离潜在语音表示中的差异来获得这种度量的。使用每个扬声器的所有可用话语，在英语脑瘫患者的英语数据库上进行了实验，具有主观可理解性指标的高和显着的相关值（R = -0.9），而在四个不同的参考扬声器对中仅具有最小的偏差（+-0.01）。我们还通过考虑每个扬声器的话语少得多，在1000次迭代中偏离1000次迭代的 +-0.02偏离 +-0.02的鲁棒性（r = -0.89偏离 +-0.02）。我们的结果之一是最早表明可以使用删除的语音表示形式用于自动病理语音可理解性评估，从而产生了参考扬声器对不变方法，适用于仅可用的情况。

Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech representations of a parallel utterance pair, obtained from a healthy reference and a pathological speaker. Experiments on an English database of Cerebral Palsy patients, using all available utterances per speaker, show high and significant correlation values (R = -0.9) with subjective intelligibility measures, while having only minimal deviation (+-0.01) across four different reference speaker pairs. We also demonstrate the robustness of the proposed method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a significantly smaller amount of utterances per speaker. Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment, resulting in a reference speaker pair invariant method, applicable in scenarios with only few utterances available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题