论文标题
通过封闭式Interlayer协作改善基于CTC的ASR模型
Improving CTC-based ASR Models with Gated Interlayer Collaboration
论文作者
论文摘要
没有外部语言模型的基于CTC的自动语音识别(ASR)模型通常缺乏建模条件依赖性和文本相互作用的能力。在本文中,我们提出了一个封闭式的层间协作(GIC)机制,以提高基于CTC的模型的性能,该机制将文本信息引入模型,从而放松了基于CTC的模型的条件独立性假设。具体而言,我们将令牌嵌入的加权总和视为每个位置的文本表示,其中特定于位置的权重是通过层间辅助CTC损失构建的SoftMax概率分布。然后,通过开发一个门单元将文本表示形式与声学特征融合在一起。关于Aishell-1,Tedlium2和Aidatatang Corpora的实验表明,该提出的方法的表现优于几个强基础。
The CTC-based automatic speech recognition (ASR) models without the external language model usually lack the capacity to model conditional dependencies and textual interactions. In this paper, we present a Gated Interlayer Collaboration (GIC) mechanism to improve the performance of CTC-based models, which introduces textual information into the model and thus relaxes the conditional independence assumption of CTC-based models. Specifically, we consider the weighted sum of token embeddings as the textual representation for each position, where the position-specific weights are the softmax probability distribution constructed via inter-layer auxiliary CTC losses. The textual representations are then fused with acoustic features by developing a gate unit. Experiments on AISHELL-1, TEDLIUM2, and AIDATATANG corpora show that the proposed method outperforms several strong baselines.