论文标题
2020年ESPNET更新:新功能,扩展的应用程序,绩效改进和未来计划
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
论文作者
论文摘要
本文介绍了ESPNET(https://github.com/espnet/espnet)的最新发展,这是一种端对结束的语音处理工具包。该项目于2017年12月启动,主要根据序列到序列建模进行端到端的语音识别实验。该项目已经迅速发展,现在涵盖了各种语音处理应用程序。现在,ESPNET还包括语音(TTS),语音对话(VC),语音翻译(ST)和语音增强(SE)的文本,并支持波束成形,语音分离,DeNoing和Dereverberation。由于序列建模属性的通用序列,所有应用程序均以端到端的方式训练,并且可以进一步整合并共同优化。此外,ESPNET通过结合变压器,高级数据增强和构象异构体为这些应用提供了这些应用的可重现的多合一食谱,并提供了最先进的性能。该项目旨在为社区提供最新的语音处理经验,以便学术界和各种行业规模的研究人员可以协作开发其技术。
This paper describes the recent development of ESPnet (https://github.com/espnet/espnet), an end-to-end speech processing toolkit. This project was initiated in December 2017 to mainly deal with end-to-end speech recognition experiments based on sequence-to-sequence modeling. The project has grown rapidly and now covers a wide range of speech processing applications. Now ESPnet also includes text to speech (TTS), voice conversation (VC), speech translation (ST), and speech enhancement (SE) with support for beamforming, speech separation, denoising, and dereverberation. All applications are trained in an end-to-end manner, thanks to the generic sequence to sequence modeling properties, and they can be further integrated and jointly optimized. Also, ESPnet provides reproducible all-in-one recipes for these applications with state-of-the-art performance in various benchmarks by incorporating transformer, advanced data augmentation, and conformer. This project aims to provide up-to-date speech processing experience to the community so that researchers in academia and various industry scales can develop their technologies collaboratively.