关于使用语义一致的语音表征用于口语理解

论文标题

关于使用语义一致的语音表征用于口语理解

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

论文作者

Laperrière, Gaëlle, Pelloin, Valentin, Rouvier, Mickaël, Stafylakis, Themos, Estève, Yannick

论文摘要

在本文中，我们研究了语义上一致的语音表示的使用来端到端口语理解（SLU）。我们采用了最近引入的SAMU-XLSR模型，该模型旨在生成单个嵌入，该嵌入在语音层面上捕获语言级别，语义上以不同语言对齐。该模型将声学框架级语音表示模型（XLS-R）与语言不可知的bert句子嵌入（LABSE）模型相结合。我们表明，使用SAMU-XLSR模型而不是初始XLS-R模型的使用可以显着提高端到端SLU框架的性能。最后，我们提出了将此模型用于SLU中语言可移植性的好处。

In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ the recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures the semantics at the utterance level, semantically aligned across different languages. This model combines the acoustic frame-level speech representation learning model (XLS-R) with the Language Agnostic BERT Sentence Embedding (LaBSE) model. We show that the use of the SAMU-XLSR model instead of the initial XLS-R model improves significantly the performance in the framework of end-to-end SLU. Finally, we present the benefits of using this model towards language portability in SLU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题