代表性学习策略来建模病理语音：多光谱分辨率的效果

论文标题

代表性学习策略来建模病理语音：多光谱分辨率的效果

Representation Learning Strategies to Model Pathological Speech: Effect of Multiple Spectral Resolutions

论文作者

Miller, Gabriel Figueiredo, Vásquez-Correa, Juan Camilo, Orozco-Arroyave, Juan Rafael, Nöth, Elmar

论文摘要

本文考虑了一种代表性学习策略，以模拟帕金森氏病，唇裂和pa的患者的语音信号。特别是，它比较了不同的参数化表示类型，例如宽带和窄带频谱图以及基于小波的缩放图，以量化每种的表示能力。定量方法包括提议模型对不同病理学和相关疾病严重程度进行分类的能力。此外，本文提出了一种称为多光谱融合的新型融合策略，该策略使用基于自动编码器的表示策略结合了宽带和窄带光谱分辨率。拟议的模型能够将帕金森氏病患者的演讲分类为95 \％。拟议的模型还能够评估帕金森氏病患者的构造障碍严重程度，其长矛相关性高达0.75。这些结果的表现优于文献中观察到的那些结果，在文献中，同一问题与同一语料库解决。

This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the ability of the proposed model to classify different pathologies and the associated disease severity. Additionally, this paper proposes a novel fusion strategy called multi-spectral fusion that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders. The proposed models are able to classify the speech from Parkinson's disease patients with accuracy up to 95\%. The proposed models were also able to asses the dysarthria severity of Parkinson's disease patients with a Spearman correlation up to 0.75. These results outperform those observed in literature where the same problem was addressed with the same corpus.

下载PDF全文

下载文献需遵守相关版权规定

论文标题