现实世界中未成文语言的语音转换翻译

论文标题

现实世界中未成文语言的语音转换翻译

Speech-to-Speech Translation For A Real-world Unwritten Language

论文作者

Chen, Peng-Jen, Tran, Kevin, Yang, Yilin, Du, Jingfei, Kao, Justine, Chung, Yu-An, Tomasello, Paden, Duquenne, Paul-Ambroise, Schwenk, Holger, Gong, Hongyu, Inaguma, Hirofumi, Popuri, Sravya, Wang, Changhan, Pino, Juan, Hsu, Wei-Ning, Lee, Ann

论文摘要

我们研究语音到语音翻译（S2ST），该翻译将语音从一种语言转换为另一种语言，并着重于构建系统，以支持没有标准文本写作系统的语言。我们使用英语台湾霍金人作为案例研究，并提出了培训数据收集，对选择的端到端解决方案，以对数据集的释放进行建模。首先，我们提出了创建人类注释数据的努力，从大型未标记的语音数据集中自动挖掘数据，并采用伪标记来产生弱监督的数据。在建模上，我们利用了将自我监督的离散表示作为S2ST预测的目标的最新进展，并显示了在模型培训中利用类似于Hokkien的语言的其他文本监督的有效性。最后，我们发布了S2ST基准测试，以促进该领域的未来研究。该演示可以在https://huggingface.co/spaces/facebook/hokkien_translation上找到。

We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating human annotated data, automatically mining data from large unlabeled speech datasets, and adopting pseudo-labeling to produce weakly supervised data. On the modeling, we take advantage of recent advances in applying self-supervised discrete representations as target for prediction in S2ST and show the effectiveness of leveraging additional text supervision from Mandarin, a language similar to Hokkien, in model training. Finally, we release an S2ST benchmark set to facilitate future research in this field. The demo can be found at https://huggingface.co/spaces/facebook/Hokkien_Translation .

下载PDF全文

下载文献需遵守相关版权规定

论文标题