论文标题
LAE:单语和多语言ASR的语言意识编码器
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
论文作者
论文摘要
尽管自动语音识别(ASR)研究取得了迅速的进展,但使用统一的ASR系统识别多语言语音仍然是极具挑战性的。以前的多语言语音识别的作品主要集中在两个方向上:识别多个单语言或识别代码切换的语音,该语音在单个话语中互换使用不同的语言。但是,务实的多语言识别器有望与这两个方向兼容。在这项工作中,提出了一种新颖的语言意识编码器(LAE)体系结构,以通过删除特定于语言的信息并在编码过程中生成框架级别的语言表示表示来处理这两种情况。在LAE中,主要编码是由共享块实现的,而特定于语言的块用于提取每种语言的特定表示。为了区分特定语言信息,提出了一种语言感知的培训方法来优化LAE中的语言特定块。根据普通话 - 英语密码开关的演讲进行的实验表明,提出的LAE能够区分框架级别的不同语言,并在单语和多语言ASR任务上显示出卓越的性能。借助录制或模拟的代码开关数据集,所提出的LAE可以在CTC和神经传感器系统上取得统计学上的显着改进。代码已发布
Despite the rapid progress in automatic speech recognition (ASR) research, recognizing multilingual speech using a unified ASR system remains highly challenging. Previous works on multilingual speech recognition mainly focus on two directions: recognizing multiple monolingual speech or recognizing code-switched speech that uses different languages interchangeably within a single utterance. However, a pragmatic multilingual recognizer is expected to be compatible with both directions. In this work, a novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information and generating frame-level language-aware representations during encoding. In the LAE, the primary encoding is implemented by the shared block while the language-specific blocks are used to extract specific representations for each language. To learn language-specific information discriminatively, a language-aware training method is proposed to optimize the language-specific blocks in LAE. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks. With either a real-recorded or simulated code-switched dataset, the proposed LAE achieves statistically significant improvements on both CTC and neural transducer systems. Code is released