论文标题
一种两阶段的方法,用于设备射击场景分类
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
论文作者
论文摘要
为了提高设备鲁棒性,提出了基于完全卷积神经网络(CNN)的新型两阶段系统的竞争性声学场景分类(ASC)系统的高度理想的关键特征。我们的两阶段系统基于两个CNN分类器的临时分数组合利用:(i)第一个CNN将声学输入分为三个宽类之一,(ii)第二个CNN将相同的输入分类为十个良好的类别之一。探索了三种不同的CNN架构以实现两阶段的分类器,并研究了频率子采样方案。此外,还研究了ASC的新型数据增强方案。在DCASE 2020任务1A上进行了评估,我们的结果表明,所提出的ASC系统在开发集合上达到了最先进的准确性,在该集合中,我们的最佳系统(CNN合奏的两阶段融合)在多设备测试数据中的平均准确性为81.9%,并且在Undeen Devesteces上获得了显着的改进。最后,使用类激活映射(CAM)的神经显着性分析给出了我们模型学到的模式的新见解。
To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finer-grained classes. Three different CNN architectures are explored to implement the two-stage classifiers, and a frequency sub-sampling scheme is investigated. Moreover, novel data augmentation schemes for ASC are also investigated. Evaluated on DCASE 2020 Task 1a, our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set, where our best system, a two-stage fusion of CNN ensembles, delivers a 81.9% average accuracy among multi-device test data, and it obtains a significant improvement on unseen devices. Finally, neural saliency analysis with class activation mapping (CAM) gives new insights on the patterns learnt by our models.