波斯语音识别的异质储层计算模型

论文标题

波斯语音识别的异质储层计算模型

Heterogeneous Reservoir Computing Models for Persian Speech Recognition

论文作者

Ansari, Zohreh, Pourhoseini, Farzin, Hadaeghi, Fatemeh

论文摘要

在过去的十年中，深入学习方法已逐渐纳入传统的自动语音识别（ASR）框架中，以创建声学，发音和语言模型。尽管它导致了ASRS的识别精度的显着提高，但由于它们与硬件要求（例如计算能力和内存使用情况）相关的硬性约束，但尚不清楚此类方法是否是嵌入式ASR应用的最计算和能源有效的选项。另一方面，储层计算（RC）模型（例如回声状态网络（ESN）和液态状态机（LSMS））已被证明是廉价的训练，具有较少的参数，并且与新兴硬件技术兼容。但是，它们在语音处理任务中的性能相对较低，而不是基于深度学习的模型。为了提高RC在ASR应用中的准确性，我们提出了异质的单层和多层ESN，以创建输入的非线性转换，以在不同尺度上捕获时间上下文。为了测试我们的模型，我们在FARSDAT波斯数据集上执行了语音识别任务。据我们所知，尚未采用标准RC来执行任何波斯ASR任务，因此我们还培训了常规的单层和深ESN，以提供基线以进行比较。此外，我们将RC性能与标准的长期内存（LSTM）模型进行了比较。异质RC模型（1）显示出标准RC模型的性能提高；（2）在LSTM的识别准确性方面执行标准杆，（3）大大减少训练时间。

Over the last decade, deep-learning methods have been gradually incorporated into conventional automatic speech recognition (ASR) frameworks to create acoustic, pronunciation, and language models. Although it led to significant improvements in ASRs' recognition accuracy, due to their hard constraints related to hardware requirements (e.g., computing power and memory usage), it is unclear if such approaches are the most computationally- and energy-efficient options for embedded ASR applications. Reservoir computing (RC) models (e.g., echo state networks (ESNs) and liquid state machines (LSMs)), on the other hand, have been proven inexpensive to train, have vastly fewer parameters, and are compatible with emergent hardware technologies. However, their performance in speech processing tasks is relatively inferior to that of the deep-learning-based models. To enhance the accuracy of the RC in ASR applications, we propose heterogeneous single and multi-layer ESNs to create non-linear transformations of the inputs that capture temporal context at different scales. To test our models, we performed a speech recognition task on the Farsdat Persian dataset. Since, to the best of our knowledge, standard RC has not yet been employed to conduct any Persian ASR tasks, we also trained conventional single-layer and deep ESNs to provide baselines for comparison. Besides, we compared the RC performance with a standard long-short-term memory (LSTM) model. Heterogeneous RC models (1) show improved performance to the standard RC models; (2) perform on par in terms of recognition accuracy with the LSTM, and (3) reduce the training time considerably.

下载PDF全文

下载文献需遵守相关版权规定

论文标题