论文标题

德语中的ASR:详细的错误分析

ASR in German: A Detailed Error Analysis

论文作者

Wirth, Johannes, Peinl, Rene

论文摘要

基于神经网络的自动语音识别(ASR)的免费系统量正在稳步增长,并具有越来越可靠的预测。但是,对训练模型的评估通常是基于统计指标(例如WER或CER)的,这些指标无法提供对从语音输入预测转录本时产生的错误的性质或影响的任何见解。这项工作介绍了一系列ASR模型体系结构,这些体系结构是在德语上鉴定的,并以不同的测试数据集的基准进行了评估。它确定了跨结构预测错误,将这些错误分为类别,并将每个类别错误的源头归为训练数据以及其他来源。最后,它讨论了解决方案,以创建定性更好的培训数据集和更健壮的ASR系统。

The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. However, the evaluation of trained models is typically exclusively based on statistical metrics such as WER or CER, which do not provide any insight into the nature or impact of the errors produced when predicting transcripts from speech input. This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. It identifies cross-architectural prediction errors, classifies those into categories and traces the sources of errors per category back into training data as well as other sources. Finally, it discusses solutions in order to create qualitatively better training datasets and more robust ASR systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源