相关语言的口语识别的跨域改编：斯拉夫语言的好奇案例

论文标题

Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages

论文作者

Abdullah, Badr M., Avgustinova, Tania, Möbius, Bernd, Klakow, Dietrich

论文摘要

基于端到端的深神经网络的最先进的口语识别（LID）系统，不仅在区分遥远语言，而且在紧密相关的语言之间，甚至是同一语言的不同口头语言方面都显示出了显着的成功。但是，目前尚不清楚神经盖模型在多大程度上推广到由于域移位而导致的声学条件不同的语音样本。在本文中，我们提出了一组实验，以调查域不匹配对神经盖系统对跨两个领域的六种斯拉夫语言的性能的影响（阅读语音和无线电广播），并检查两个低级信号描述符（光谱和sepral和cepstral特征）。我们的实验表明，（1）室外语音样本严重阻碍了神经盖模型的性能，而（2）频谱和sepstral特征在域内表现出可比性的性能，频谱特征在域不匹配下表现出更强的鲁棒性。此外，我们应用无监督的域适应性，以最大程度地减少我们研究中两个域之间的差异。我们实现相对准确性的提高，取决于源域中声学条件的多样性，范围从9％到77％。

State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with different acoustic conditions due to domain shift. In this paper, we present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains (read speech and radio broadcast) and examine two low-level signal descriptors (spectral and cepstral features) for this task. Our experiments show that (1) out-of-domain speech samples severely hinder the performance of neural LID models, and (2) while both spectral and cepstral features show comparable performance within-domain, spectral features show more robustness under domain mismatch. Moreover, we apply unsupervised domain adaptation to minimize the discrepancy between the two domains in our study. We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题