对齐，写作，重新订购：通过操作序列生成可解释的端到端语音翻译

论文标题

对齐，写作，重新订购：通过操作序列生成可解释的端到端语音翻译

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

论文作者

Omachi, Motoi, Yan, Brian, Dalmia, Siddharth, Fujita, Yuya, Watanabe, Shinji

论文摘要

端到端语音翻译（E2E ST）系统的黑框性质使得很难理解源语言输入是如何映射到目标语言的。为了解决这个问题，我们想同时生成自动语音识别（ASR）和ST预测，以便将每个源语言单词明确映射到目标语言单词。一个重大挑战源于以下事实：翻译是由于语言之间的单词排序差异而导致的非单调序列转导任务 - 这与ASR的单调性质发生冲突。因此，我们建议在记住如何以后重新订购它们的同时生成ST令牌。我们通过预测由源单词，相应的目标词和编辑后的操作来决定目标词正确插入点的一系列元素来实现这一目标。我们检查了这种操作序列的两个变体，这些变体能够同时从相同的语音输入中产生单调转录和非单调翻译。我们将方法应用于离线和实时流式传输模型，表明我们可以在不牺牲质量或延迟的情况下提供可解释的翻译。实际上，我们方法的延迟重新排序能力可以提高流媒体过程中的性能。为了额外的好处，我们的方法同时执行ASR和ST，使其比使用两个独立的系统执行这些任务更快。

The black-box nature of end-to-end speech translation (E2E ST) systems makes it difficult to understand how source language inputs are being mapped to the target language. To solve this problem, we would like to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word. A major challenge arises from the fact that translation is a non-monotonic sequence transduction task due to word ordering differences between languages -- this clashes with the monotonic nature of ASR. Therefore, we propose to generate ST tokens out-of-order while remembering how to re-order them later. We achieve this by predicting a sequence of tuples consisting of a source word, the corresponding target words, and post-editing operations dictating the correct insertion points for the target word. We examine two variants of such operation sequences which enable generation of monotonic transcriptions and non-monotonic translations from the same speech input simultaneously. We apply our approach to offline and real-time streaming models, demonstrating that we can provide explainable translations without sacrificing quality or latency. In fact, the delayed re-ordering ability of our approach improves performance during streaming. As an added benefit, our method performs ASR and ST simultaneously, making it faster than using two separate systems to perform these tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题