论文标题
电源池:弱标记的声音事件检测的自适应池功能
Power pooling: An adaptive pooling function for weakly labelled sound event detection
论文作者
论文摘要
在工程应用程序中,可以访问带有强烈标签的声音事件的大型语料库昂贵且困难。许多研究转向解决了如何使用仅指定类型的弱标签的声音事件的类型和时间戳的问题。可以将此任务视为多个实例学习(MIL)问题,其关键是汇总功能的设计。在本文中,我们提出了一个自适应功率池功能,该功能可以自动适应各种声音源。在两个公共数据集上,提出的幂池函数的表现优于在粗粒度和细粒度指标上的最新线性软磁性池。值得注意的是,它在两个数据集中将基于事件的F1分数(评估事件和偏移的检测评估)相对11.4%和10.2%。虽然本文着重于声音事件检测应用程序,但建议的方法可以应用于其他域中的MIL任务。
Access to large corpora with strongly labelled sound events is expensive and difficult in engineering applications. Much research turns to address the problem of how to detect both the types and the timestamps of sound events with weak labels that only specify the types. This task can be treated as a multiple instance learning (MIL) problem, and the key to it is the design of a pooling function. In this paper, we propose an adaptive power pooling function which can automatically adapt to various sound sources. On two public datasets, the proposed power pooling function outperforms the state-of-the-art linear softmax pooling on both coarsegrained and fine-grained metrics. Notably, it improves the event-based F1 score (which evaluates the detection of event onsets and offsets) by 11.4% and 10.2% relative on the two datasets. While this paper focuses on sound event detection applications, the proposed method can be applied to MIL tasks in other domains.