论文标题
关于使用语义一致的语音表征用于口语理解
On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding
论文作者
论文摘要
在本文中,我们研究了语义上一致的语音表示的使用来端到端口语理解(SLU)。我们采用了最近引入的SAMU-XLSR模型,该模型旨在生成单个嵌入,该嵌入在语音层面上捕获语言级别,语义上以不同语言对齐。该模型将声学框架级语音表示模型(XLS-R)与语言不可知的bert句子嵌入(LABSE)模型相结合。我们表明,使用SAMU-XLSR模型而不是初始XLS-R模型的使用可以显着提高端到端SLU框架的性能。最后,我们提出了将此模型用于SLU中语言可移植性的好处。
In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ the recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures the semantics at the utterance level, semantically aligned across different languages. This model combines the acoustic frame-level speech representation learning model (XLS-R) with the Language Agnostic BERT Sentence Embedding (LaBSE) model. We show that the use of the SAMU-XLSR model instead of the initial XLS-R model improves significantly the performance in the framework of end-to-end SLU. Finally, we present the benefits of using this model towards language portability in SLU.