一种新型的基于暂时性 - 基于声学信号增强的卷积复发体系结构

论文标题

一种新型的基于暂时性 - 基于声学信号增强的卷积复发体系结构

A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement

论文作者

Hussain, Tassadaq, Wang, Wei-Chien, Gogate, Mandar, Dashtipour, Kia, Tsao, Yu, Lu, Xugang, Ahsan, Adeel, Hussain, Amir

论文摘要

在声学信号处理中，目标信号通常携带语义信息，该信息是在短期和长期背景的层次结构中编码的。但是，背景噪声以不均匀的方式扭曲了这些结构。现有的深度声信号增强（ASE）架构忽略了这种局部和全球效应。为了解决这个问题，我们建议将一种新型的时间专注（TAP）机制整合到常规的卷积复发性神经网络中，称为TAP-CRNN。拟议的方法考虑了全球和本地对ASE任务的关注。具体而言，我们首先使用卷积层来提取声学信号的本地信息，然后使用复发性神经网络（RNN）体系结构来表征时间上下文信息。其次，我们利用一种新颖的机制来处理嘈杂信号的明显区域。使用基准的婴儿哭泣数据集评估了所提出的ASE系统，并将其与几种知名方法进行了比较。结果表明，在具有挑战性的信号到噪声水平下，TAPCRNN可以更有效地减少在看不见的背景噪声中，从婴儿哭泣信号中降低噪声成分。

In acoustic signal processing, the target signals usually carry semantic information, which is encoded in a hierarchal structure of short and long-term contexts. However, the background noise distorts these structures in a nonuniform way. The existing deep acoustic signal enhancement (ASE) architectures ignore this kind of local and global effect. To address this problem, we propose to integrate a novel temporal attentive-pooling (TAP) mechanism into a conventional convolutional recurrent neural network, termed as TAP-CRNN. The proposed approach considers both global and local attention for ASE tasks. Specifically, we first utilize a convolutional layer to extract local information of the acoustic signals and then a recurrent neural network (RNN) architecture is used to characterize temporal contextual information. Second, we exploit a novelattention mechanism to contextually process salient regions of the noisy signals. The proposed ASE system is evaluated using a benchmark infant cry dataset and compared with several well-known methods. It is shown that the TAPCRNN can more effectively reduce noise components from infant cry signals in unseen background noises at challenging signal-to-noise levels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题