论文标题
使用自动语音识别派生的措施进行非侵入性语音可理解性预测
Non-intrusive speech intelligibility prediction using automatic speech recognition derived measures
论文作者
论文摘要
语音清晰度的估计仍然远非解决问题。尤其是一个方面是有问题的:大多数标准模型都需要一个干净的参考信号才能估算清晰度。这是一个具有重要意义的问题,因为在实践中通常无法使用参考信号。在这项工作中,提出了非侵入性语音可理解性估计框架。在其中,通过使用自动语音识别训练的模型(ASR)得出的清晰度度量(ASR),预测了人类听众在关键字识别任务中的表现。一个基于ASR和一个基于信号的度量合并为一个完整的框架,拟议的无参考可理解性(NORI)估计量在预测在多个噪声条件下正常听力和听力受损的听众的性能时进行了评估。结果表明,在大多数被考虑的情况下,NORI框架甚至胜过广泛使用的基于参考的(或侵入性)短期客观可理解性(Stoi)度量,同时适用于没有参考信号或转录的完全盲人场景,创建了在线观点,并对语音增强系统进行了个性化优化。
The estimation of speech intelligibility is still far from being a solved problem. Especially one aspect is problematic: most of the standard models require a clean reference signal in order to estimate intelligibility. This is an issue of some significance, as a reference signal is often unavailable in practice. In this work, therefore a non-intrusive speech intelligibility estimation framework is presented. In it, human listeners' performance in keyword recognition tasks is predicted using intelligibility measures that are derived from models trained for automatic speech recognition (ASR). One such ASR-based and one signal-based measure are combined into a full framework, the proposed NO-Reference Intelligibility (Nori) estimator, which is evaluated in predicting the performance of both normal-hearing and hearing-impaired listeners in multiple noise conditions. It is shown that the Nori framework even outperforms the widely used reference-based (or intrusive) short-term objective intelligibility (STOI) measure in most considered scenarios, while being applicable in fully blind scenarios with no reference signal or transcription, creating perspectives for online and personalized optimization of speech enhancement systems.