论文标题
GLD-NET:通过学习全球和本地依赖性特征来改善单声道语音增强,GLD块
GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block
论文作者
论文摘要
为了增强单声道语音,上下文信息对于准确的语音估计很重要。但是,通常使用的卷积神经网络(CNN)在捕获时间上下文时很弱,因为它们仅构建一次处理一个本地社区的块。为了解决这个问题,我们从人类听觉感知中学习,引入了一种两阶段的可训练的推理机制,称为全球本地依赖性(GLD)块。 GLD块从嘈杂的频谱图中捕获了全球和局部水平上的时间频率箱的长期依赖性,以帮助检测语音部分,噪声部分和整个嘈杂输入之间的相关性。更重要的是,我们进行了一个名为GLD-NET的单声道语音增强网络,该网络采用了编码器架构,并由语音对象分支,干扰分支和全球嘈杂分支组成。在每个分支机构中,有效地将全球级别和本地级别的提取的语音特征有效地进行了推理和汇总。我们将提出的GLD-NET与WSJ0上的现有最新方法和需求数据集进行了比较。结果表明,GLD-NET在PESQ和Stoi方面优于最先进的方法。
For monaural speech enhancement, contextual information is important for accurate speech estimation. However, commonly used convolution neural networks (CNNs) are weak in capturing temporal contexts since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human auditory perception to introduce a two-stage trainable reasoning mechanism, referred as global-local dependency (GLD) block. GLD blocks capture long-term dependency of time-frequency bins both in global level and local level from the noisy spectrogram to help detecting correlations among speech part, noise part, and whole noisy input. What is more, we conduct a monaural speech enhancement network called GLD-Net, which adopts encoder-decoder architecture and consists of speech object branch, interference branch, and global noisy branch. The extracted speech feature at global-level and local-level are efficiently reasoned and aggregated in each of the branches. We compare the proposed GLD-Net with existing state-of-art methods on WSJ0 and DEMAND dataset. The results show that GLD-Net outperforms the state-of-the-art methods in terms of PESQ and STOI.