论文标题

语言与说话者更改:比较研究

Language vs Speaker Change: A Comparative Study

论文作者

Mishra, Jagabandhu, Prasanna, S. R. Mahadeva

论文摘要

口语语言变更检测(LCD)是指在多语言语音信号中检测语言切换点。说话者变更检测(SCD)是指在多言扬声器语音信号中找到扬声器变更点。这项工作的目的是通过将LCD任务与SCD任务进行比较来了解LCD任务的挑战。针对LCD和SCD进行了人类主观研究。这项研究表明,与SCD相比,LCD需要更大的持续时间光谱信息。基于此,这项工作探讨了基于距离的自动和基于模型的LCD方法。基于模型的模型包括高斯混合模型和通用背景模型(GMM-UBM),注意力和基于生成的对抗网络(GAN)方法。人类和自动LCD任务都推断出LCD任务的性能通过纳入越来越多的频段持续时间来改善。

Spoken language change detection (LCD) refers to detecting language switching points in a multilingual speech signal. Speaker change detection (SCD) refers to locating the speaker change points in a multispeaker speech signal. The objective of this work is to understand the challenges in LCD task by comparing it with SCD task. Human subjective study for change detection is performed for LCD and SCD. This study demonstrates that LCD requires larger duration spectro-temporal information around the change point compared to SCD. Based on this, the work explores automatic distance based and model based LCD approaches. The model based ones include Gaussian mixture model and universal background model (GMM-UBM), attention, and Generative adversarial network (GAN) based approaches. Both the human and automatic LCD tasks infer that the performance of the LCD task improves by incorporating more and more spectro-temporal duration.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源