GSEP：使用封闭式CBHG模块和响度归一化的强大声音和伴奏分离系统

论文标题

GSEP：使用封闭式CBHG模块和响度归一化的强大声音和伴奏分离系统

GSEP: A robust vocal and accompaniment separation system using gated CBHG module and loudness normalization

论文作者

Park, Soochul, Chon, Ben Sangbae

论文摘要

在音频信号处理研究领域，很长一段时间以来，来源分离一直是一个流行的研究主题，最近采用深层神经网络的性能显着改善。改进使该行业活跃起来，有生产音频深度学习的产品和服务，包括音乐流应用程序中的卡拉OK和UHDTV中的对话增强。对于这些早期市场，我们在稳健性，质量和成本方面定义了声乐和伴奏分离模型的一系列设计原理。在本文中，我们介绍了GSEP（Gaudio源分离系统），这是一种使用封闭式CBHG模块，面膜翘曲和响度归一化的强大声音和伴奏分离系统，并且已经证实，该系统能够满足所有三个原理，并通过实验中的客观评估和主题评估中的所有三个原理，并胜过整理的系统。

In the field of audio signal processing research, source separation has been a popular research topic for a long time and the recent adoption of the deep neural networks have shown a significant improvement in performance. The improvement vitalizes the industry to productize audio deep learning based products and services including Karaoke in the music streaming apps and dialogue enhancement in the UHDTV. For these early markets, we defined a set of design principles of the vocal and accompaniment separation model in terms of robustness, quality, and cost. In this paper, we introduce GSEP (Gaudio source SEParation system), a robust vocal and accompaniment separation system using a Gated- CBHG module, mask warping, and loudness normalization and it was verified that the proposed system satisfies all three principles and outperforms the state-of-the-art systems both in objective measure and subjective assessment through experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题