论文标题
一种新型的基于暂时性 - 基于声学信号增强的卷积复发体系结构
A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement
论文作者
论文摘要
在声学信号处理中,目标信号通常携带语义信息,该信息是在短期和长期背景的层次结构中编码的。但是,背景噪声以不均匀的方式扭曲了这些结构。现有的深度声信号增强(ASE)架构忽略了这种局部和全球效应。为了解决这个问题,我们建议将一种新型的时间专注(TAP)机制整合到常规的卷积复发性神经网络中,称为TAP-CRNN。拟议的方法考虑了全球和本地对ASE任务的关注。具体而言,我们首先使用卷积层来提取声学信号的本地信息,然后使用复发性神经网络(RNN)体系结构来表征时间上下文信息。其次,我们利用一种新颖的机制来处理嘈杂信号的明显区域。使用基准的婴儿哭泣数据集评估了所提出的ASE系统,并将其与几种知名方法进行了比较。结果表明,在具有挑战性的信号到噪声水平下,TAPCRNN可以更有效地减少在看不见的背景噪声中,从婴儿哭泣信号中降低噪声成分。
In acoustic signal processing, the target signals usually carry semantic information, which is encoded in a hierarchal structure of short and long-term contexts. However, the background noise distorts these structures in a nonuniform way. The existing deep acoustic signal enhancement (ASE) architectures ignore this kind of local and global effect. To address this problem, we propose to integrate a novel temporal attentive-pooling (TAP) mechanism into a conventional convolutional recurrent neural network, termed as TAP-CRNN. The proposed approach considers both global and local attention for ASE tasks. Specifically, we first utilize a convolutional layer to extract local information of the acoustic signals and then a recurrent neural network (RNN) architecture is used to characterize temporal contextual information. Second, we exploit a novelattention mechanism to contextually process salient regions of the noisy signals. The proposed ASE system is evaluated using a benchmark infant cry dataset and compared with several well-known methods. It is shown that the TAPCRNN can more effectively reduce noise components from infant cry signals in unseen background noises at challenging signal-to-noise levels.