Feathertts：强大而有效的基于注意力的神经TTS

论文标题

Feathertts：强大而有效的基于注意力的神经TTS

FeatherTTS: Robust and Efficient attention based Neural TTS

论文作者

Tian, Qiao, Zhang, Zewang, Liu, Chao, Lu, Heng, Chen, Linghui, Wei, Bin, He, Pujiang, Liu, Shan

论文摘要

基于注意力的神经TT是优雅的语音综合管道，并且表现出产生自然语音的强大能力。但是，它仍然不足以满足工业产品的稳定性要求。此外，它具有缓慢的推理速度，而自回归产生过程。在这项工作中，我们提出了Feathertts，这是一种强大而有效的基于注意力的神经TTS系统。首先，我们提出了一种新颖的高斯注意力，它利用高斯注意力的可解释性和TTS中严格的单调特性。通过这种方法，我们用细心的停止预测替换了常用的停止令牌预测体系结构。其次，我们将块稀疏性应用于自回归解码器上以加快语音综合。实验结果表明，我们提出的feathertts不仅消除了跳过单词的问题，在特别硬的文本中重复，并保持发言的自然性，而且还使声学特征的产生加快了3.5倍。总体而言，拟议的Feathertts可以比单个CPU的实时快35美元。

Attention based neural TTS is elegant speech synthesis pipeline and has shown a powerful ability to generate natural speech. However, it is still not robust enough to meet the stability requirements for industrial products. Besides, it suffers from slow inference speed owning to the autoregressive generation process. In this work, we propose FeatherTTS, a robust and efficient attention-based neural TTS system. Firstly, we propose a novel Gaussian attention which utilizes interpretability of Gaussian attention and the strict monotonic property in TTS. By this method, we replace the commonly used stop token prediction architecture with attentive stop prediction. Secondly, we apply block sparsity on the autoregressive decoder to speed up speech synthesis. The experimental results show that our proposed FeatherTTS not only nearly eliminates the problem of word skipping, repeating in particularly hard texts and keep the naturalness of generated speech, but also speeds up acoustic feature generation by 3.5 times over Tacotron. Overall, the proposed FeatherTTS can be $35$x faster than real-time on a single CPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题