在真实场景中，在多维的多维上进行数据增强和挤压网络，以进行声音事件本地化和检测

论文标题

在真实场景中，在多维的多维上进行数据增强和挤压网络，以进行声音事件本地化和检测

Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

论文作者

Ko, Byeong-Yun, Nam, Hyeonuk, Kim, Seong-Hu, Min, Deokki, Choi, Seung-Deok, Park, Yong-Hwa

论文摘要

由于难以获得带有准确标签的足够数量的现实多通道音频数据记录，因此在实际场景中的声音事件定位和检测（SELD）受到SELD数据集的少量限制。我们使用了两种主要策略来解决由小的真实SELD数据集引起的问题。首先，我们对所有数据维度应用了各种数据增强方法：通道，频率和时间。我们还提出了名为中等混合的原始数据增强方法，以模拟存在噪声层或干扰事件的情况。其次，我们在通道和频率尺寸上应用了挤压和兴奋块，以有效提取特征特征。我们在Stars22测试数据集上训练的模型的结果分别达到了0.53、49.8％，16.0度和56.2％的最佳ER，F1，LE和LR。

Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions: channel, frequency and time. We also propose original data augmentation method named Moderate Mixup in order to simulate situations where noise floor or interfering events exist. Second, we applied Squeeze-and-Excitation block on channel and frequency dimensions to efficiently extract feature characteristics. Result of our trained models on the STARSS22 test dataset achieved the best ER, F1, LE, and LR of 0.53, 49.8%, 16.0deg., and 56.2% respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题