一种两阶段的方法，用于设备射击场景分类

论文标题

一种两阶段的方法，用于设备射击场景分类

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

论文作者

Hu, Hu, Yang, Chao-Han Huck, Xia, Xianjun, Bai, Xue, Tang, Xin, Wang, Yajian, Niu, Shutong, Chai, Li, Li, Juanjuan, Zhu, Hongning, Bao, Feng, Zhao, Yuanjun, Siniscalchi, Sabato Marco, Wang, Yannan, Du, Jun, Lee, Chin-Hui

论文摘要

为了提高设备鲁棒性，提出了基于完全卷积神经网络（CNN）的新型两阶段系统的竞争性声学场景分类（ASC）系统的高度理想的关键特征。我们的两阶段系统基于两个CNN分类器的临时分数组合利用：（i）第一个CNN将声学输入分为三个宽类之一，（ii）第二个CNN将相同的输入分类为十个良好的类别之一。探索了三种不同的CNN架构以实现两阶段的分类器，并研究了频率子采样方案。此外，还研究了ASC的新型数据增强方案。在DCASE 2020任务1A上进行了评估，我们的结果表明，所提出的ASC系统在开发集合上达到了最先进的准确性，在该集合中，我们的最佳系统（CNN合奏的两阶段融合）在多设备测试数据中的平均准确性为81.9％，并且在Undeen Devesteces上获得了显着的改进。最后，使用类激活映射（CAM）的神经显着性分析给出了我们模型学到的模式的新见解。

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finer-grained classes. Three different CNN architectures are explored to implement the two-stage classifiers, and a frequency sub-sampling scheme is investigated. Moreover, novel data augmentation schemes for ASC are also investigated. Evaluated on DCASE 2020 Task 1a, our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set, where our best system, a two-stage fusion of CNN ensembles, delivers a 81.9% average accuracy among multi-device test data, and it obtains a significant improvement on unseen devices. Finally, neural saliency analysis with class activation mapping (CAM) gives new insights on the patterns learnt by our models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题