扬声器式Pro：改进的目标扬声器提取器结合了时域和频域

论文标题

扬声器式Pro：改进的目标扬声器提取器结合了时域和频域

Speakerfilter-Pro: an improved target speaker extractor combines the time domain and frequency domain

论文作者

He, Shulin, Li, Hao, Zhang, Xueliang

论文摘要

本文根据我们以前的SpeakerFilter模型，介绍了改进的目标扬声器提取器，称为SpeakerFilter-Pro。扬声器使用双向封闭式复发单元（BGRU）模块来表征目标扬声器与锚式的演讲，并使用卷积的经常性网络（CRN）模块将目标语音与嘈杂的信号分开。扬声器的差异差异，扬声器界面，SpeakerFilter，SpeakerFilter-Pro粘贴了Wavebunet模块，在开始和端点相对相应。事实证明，Waveunet具有更好的能力在时间域中进行语音分离。为了更好地提取目标扬声器信息，将复杂频谱而不是幅度频谱用作CRN模块的输入特征。实验是在两个扬声器数据集（WSJ0-MIX2）上进行的，该数据集广泛用于扬声器提取。系统的评估表明，SakerFilter-Pro的表现优于说话者和其他基线，并达到14.95 dB的信噪比（SDR）。

This paper introduces an improved target speaker extractor, referred to as Speakerfilter-Pro, based on our previous Speakerfilter model. The Speakerfilter uses a bi-direction gated recurrent unit (BGRU) module to characterize the target speaker from anchor speech and use a convolutional recurrent network (CRN) module to separate the target speech from a noisy signal.Different from the Speakerfilter, the Speakerfilter-Pro sticks a WaveUNet module in the beginning and the ending, respectively. The WaveUNet has been proven to have a better ability to perform speech separation in the time domain. In order to extract the target speaker information better, the complex spectrum instead of the magnitude spectrum is utilized as the input feature for the CRN module. Experiments are conducted on the two-speaker dataset (WSJ0-mix2) which is widely used for speaker extraction. The systematic evaluation shows that the Speakerfilter-Pro outperforms the Speakerfilter and other baselines, and achieves a signal-to-distortion ratio (SDR) of 14.95 dB.

下载PDF全文

下载文献需遵守相关版权规定

论文标题