低潜伏期无延迟的端到端口语理解系统

论文标题

低潜伏期无延迟的端到端口语理解系统

A low latency ASR-free end to end spoken language understanding system

论文作者

Mhiri, Mohamed, Myer, Samuel, Tomar, Vikrant Singh

论文摘要

近年来，开发一种语音理解系统将波形分类为结构化数据（例如意图和插槽），而没有先将语音转录为文本，这已成为一个有趣的研究问题。这项工作提出了诸如设计系统的额外限制之类的系统，该系统具有足够小的占地面积，可以在小型微型控制器和嵌入式系统上运行，并且延迟最小。给定流量输入语音信号，提出的系统可以在处理时不需要整个流，而无需将整个流进行处理。在公开可用的语音命令数据集上评估了所提出的系统。实验表明，与其他在同一任务上发表的作品相比，所提出的系统具有低潜伏期和较小模型的优势，具有最先进的性能。

In recent years, developing a speech understanding system that classifies a waveform to structured data, such as intents and slots, without first transcribing the speech to text has emerged as an interesting research problem. This work proposes such as system with an additional constraint of designing a system that has a small enough footprint to run on small micro-controllers and embedded systems with minimal latency. Given a streaming input speech signal, the proposed system can process it segment-by-segment without the need to have the entire stream at the moment of processing. The proposed system is evaluated on the publicly available Fluent Speech Commands dataset. Experiments show that the proposed system yields state-of-the-art performance with the advantage of low latency and a much smaller model when compared to other published works on the same task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题