在英语和西班牙口音上推动ASR模型的表演

论文标题

在英语和西班牙口音上推动ASR模型的表演

Pushing the performances of ASR models on English and Spanish accents

论文作者

Chitkara, Pooja, Riviere, Morgane, Copet, Jade, Zhang, Frank, Saraf, Yatharth

论文摘要

对文本模型的语音倾向于针对单个目标口音进行训练和评估。对于英语而言，这尤其如此，这是美国的母语人士成为主要基准。在这项工作中，我们将展示两种简单的方法：预训练的嵌入和辅助分类损失可以改善ASR系统的性能。我们正在寻找尽可能通用的升级，因此我们将探索它们对几种模型架构和几种语言的影响。

Speech to text models tend to be trained and evaluated against a single target accent. This is especially true for English for which native speakers from the United States became the main benchmark. In this work, we are going to show how two simple methods: pre-trained embeddings and auxiliary classification losses can improve the performance of ASR systems. We are looking for upgrades as universal as possible and therefore we will explore their impact on several models architectures and several languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题