论文标题

关于使用语义一致的语音表征用于口语理解

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

论文作者

Laperrière, Gaëlle, Pelloin, Valentin, Rouvier, Mickaël, Stafylakis, Themos, Estève, Yannick

论文摘要

在本文中,我们研究了语义上一致的语音表示的使用来端到端口语理解(SLU)。我们采用了最近引入的SAMU-XLSR模型,该模型旨在生成单个嵌入,该嵌入在语音层面上捕获语言级别,语义上以不同语言对齐。该模型将声学框架级语音表示模型(XLS-R)与语言不可知的bert句子嵌入(LABSE)模型相结合。我们表明,使用SAMU-XLSR模型而不是初始XLS-R模型的使用可以显着提高端到端SLU框架的性能。最后,我们提出了将此模型用于SLU中语言可移植性的好处。

In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ the recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures the semantics at the utterance level, semantically aligned across different languages. This model combines the acoustic frame-level speech representation learning model (XLS-R) with the Language Agnostic BERT Sentence Embedding (LaBSE) model. We show that the use of the SAMU-XLSR model instead of the initial XLS-R model improves significantly the performance in the framework of end-to-end SLU. Finally, we present the benefits of using this model towards language portability in SLU.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源