基于变形金刚的流媒体汇编累积注意

论文标题

基于变形金刚的流媒体汇编累积注意

Transformer-based Streaming ASR with Cumulative Attention

论文作者

Li, Mohan, Zhang, Shucong, Zorila, Catalin, Doddipatla, Rama

论文摘要

在本文中，我们提出了一种在线注意机制，称为累积注意（CA），用于基于流媒体的自动语音识别（ASR）。受到单调块状注意（MOCHA）和头部同步解码器 - 末端自适应计算步骤（HS-DACS）算法的启发，CA触发了基于在每个编码时间段的声音信息中积累的声学信息的ASR输出，在每个编码时间表上积累的声学信息，在使用可训练的设备的情况下，使用可训练的设备，引用到停止选择者。在CA中，将同一解码器层的所有注意力头都同步为统一停止位置。此功能有效地减轻了由个别头部的独特行为引起的问题，否则这可能会导致摩卡岛遇到的严重潜伏期问题。与文献中的其他流媒体变压器系统相比，在Aishell-1和LibrisPeech数据集上进行的ASR实验表明，提出的基于CA的变压器系统可以在推断期间的延迟或更高的性能下实现，而在推断期间的潜伏期显着降低。

In this paper, we propose an online attention mechanism, known as cumulative attention (CA), for streaming Transformer-based automatic speech recognition (ASR). Inspired by monotonic chunkwise attention (MoChA) and head-synchronous decoder-end adaptive computation steps (HS-DACS) algorithms, CA triggers the ASR outputs based on the acoustic information accumulated at each encoding timestep, where the decisions are made using a trainable device, referred to as halting selector. In CA, all the attention heads of the same decoder layer are synchronised to have a unified halting position. This feature effectively alleviates the problem caused by the distinct behaviour of individual heads, which may otherwise give rise to severe latency issues as encountered by MoChA. The ASR experiments conducted on AIShell-1 and Librispeech datasets demonstrate that the proposed CA-based Transformer system can achieve on par or better performance with significant reduction in latency during inference, when compared to other streaming Transformer systems in literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题