论文标题
时间分辨率对音频标记和声音事件检测的卷积复发网络的影响
Impact of temporal resolution on convolutional recurrent networks for audio tagging and sound event detection
论文作者
论文摘要
许多用于音频标记和声音事件检测的最先进的系统采用卷积复发性神经体系结构。通常,他们在平均教师环境中接受培训,以处理可用数据的异质注释。 在这项工作中,我们对这些卷积复发性神经网络的时间分辨率进行了详尽的分析 - 可以通过简单地调整其汇总操作来影响其性能。通过使用各种评估指标,我们研究了在时间定位方面涉及不同需求的几种声音识别场景下适应此设计参数的效果。
Many state-of-the-art systems for audio tagging and sound event detection employ convolutional recurrent neural architectures. Typically, they are trained in a mean teacher setting to deal with the heterogeneous annotation of the available data. In this work, we present a thorough analysis of how changing the temporal resolution of these convolutional recurrent neural networks - which can be done by simply adapting their pooling operations - impacts their performance. By using a variety of evaluation metrics, we investigate the effects of adapting this design parameter under several sound recognition scenarios involving different needs in terms of temporal localization.