论文标题

有效叶:更快的可疑音频前端

EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

论文作者

Schlüter, Jan, Gutenbrunner, Gerald

论文摘要

在音频分类中,很少有参数的可区分的听觉过滤库覆盖了硬编码频谱图和原始音频之间的中间立场。 Leaf(Arxiv:2101.08596)是一种基于Gabor的过滤库与每通道能量归一化(PCEN)结合使用,显示出令人鼓舞的结果,但计算上的昂贵。随着不均匀的卷积内核大小和大步,通过更有效地达到相似的结果,我们可以更有效地达到相似的结果。在六个音频分类任务的实验中,我们的前端以叶子的准确性为3%,但两者都无法始终胜过固定的MEL FilterBank。对可学习的音频前端的寻求无法解决。

In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy Normalization (PCEN), has shown promising results, but is computationally expensive. With inhomogeneous convolution kernel sizes and strides, and by replacing PCEN with better parallelizable operations, we can reach similar results more efficiently. In experiments on six audio classification tasks, our frontend matches the accuracy of LEAF at 3% of the cost, but both fail to consistently outperform a fixed mel filterbank. The quest for learnable audio frontends is not solved.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源