语音翻译的跨模式对比度学习

论文标题

语音翻译的跨模式对比度学习

Cross-modal Contrastive Learning for Speech Translation

论文作者

Ye, Rong, Wang, Mingxuan, Li, Lei

论文摘要

我们如何学习口语及其书面文字的统一表示？学习语义上相似语音和文本的类似表示对于语音翻译很重要。为此，我们提出了const，这是一种用于端到端语音到文本翻译的跨模式对比度学习方法。我们在流行的基准必必C上评估了const和各种以前的基线。实验表明，拟议的构成始终优于先前的方法，并且平均BLEU为29.4。该分析进一步验证了const确实缩小了不同模式的表示差距 - 其学习的表示形式将跨模式语音文本检索的准确性从4％提高到88％。代码和型号可在https://github.com/reneeye/const上找到。

How can we learn unified representations for spoken utterances and their written text? Learning similar representations for semantically similar speech and text is important for speech translation. To this end, we propose ConST, a cross-modal contrastive learning method for end-to-end speech-to-text translation. We evaluate ConST and a variety of previous baselines on a popular benchmark MuST-C. Experiments show that the proposed ConST consistently outperforms the previous methods on, and achieves an average BLEU of 29.4. The analysis further verifies that ConST indeed closes the representation gap of different modalities -- its learned representation improves the accuracy of cross-modal speech-text retrieval from 4% to 88%. Code and models are available at https://github.com/ReneeYe/ConST.

下载PDF全文

下载文献需遵守相关版权规定

论文标题