wadenet：基于小波分解的语音处理的CNN

论文标题

wadenet：基于小波分解的语音处理的CNN

WaDeNet: Wavelet Decomposition based CNN for Speech Processing

论文作者

Suresh, Prithvi, Ragav, Abhijith

论文摘要

现有的语音处理系统由不同的模块组成，分别针对特定任务（例如声学建模或特征提取）进行了优化。除了不确保系统的最佳性外，当前语音处理系统的脱节性质使它们不适合无处不在的健康应用。我们提出了Wadenet，这是一种用于移动语音处理的端到端模型。为了结合光谱特征，Wadenet将语音信号的小波分解嵌入体系结构中。这使Wadenet能够以端到端的方式从光谱特征中学习，从而减轻了目前在语音处理系统中目前存在的功能提取和连续模块的需求。 Wadenet在数据集中的当前状态优于涉及移动健康应用程序（例如非侵入性情绪识别）的当前状态。与现有模型的现有状态相比，Wadenet的准确度平均增加了6.36％。此外，Wadenet比具有类似架构的简单CNN轻得多。

Existing speech processing systems consist of different modules, individually optimized for a specific task such as acoustic modelling or feature extraction. In addition to not assuring optimality of the system, the disjoint nature of current speech processing systems make them unsuitable for ubiquitous health applications. We propose WaDeNet, an end-to-end model for mobile speech processing. In order to incorporate spectral features, WaDeNet embeds wavelet decomposition of the speech signal within the architecture. This allows WaDeNet to learn from spectral features in an end-to-end manner, thus alleviating the need for feature extraction and successive modules that are currently present in speech processing systems. WaDeNet outperforms the current state of the art in datasets that involve speech for mobile health applications such as non-invasive emotion recognition. WaDeNet achieves an average increase in accuracy of 6.36% when compared to the existing state of the art models. Additionally, WaDeNet is considerably lighter than a simple CNNs with a similar architecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题