metaaudio：几个音频分类基准

论文标题

metaaudio：几个音频分类基准

MetaAudio: A Few-Shot Audio Classification Benchmark

论文作者

Heggan, Calum, Budgett, Sam, Hospedales, Timothy, Yaghoobi, Mehrdad

论文摘要

目前可用于几次学习的基准（机器学习，很少有培训示例）在其涵盖的域中受到限制，主要集中于图像分类。这项工作旨在通过提供第一个全面，公共和完全可重现的基于音频的替代方案来减轻对基于图像的基准测试的依赖，从而涵盖各种声音域和实验环境。我们比较了七个音频数据集（跨越环境声音与人类语音）上各种技术的几种分类性能。扩展这一点，我们对联合培训（在培训期间使用所有数据集）和跨数据库适应方案进行了深入的分析，并确定了广义音频分类算法的可能性。我们的实验显示了基于梯度的元学习方法，例如MAML和元外色，始终优于度量和基线方法。我们还证明，联合培训常规有助于包含环境声音数据库的总体概括，并且是解决跨数据库/域设置的一种有效的方法。

Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by offering the first comprehensive, public and fully reproducible audio based alternative, covering a variety of sound domains and experimental settings. We compare the few-shot classification performance of a variety of techniques on seven audio datasets (spanning environmental sounds to human-speech). Extending this, we carry out in-depth analyses of joint training (where all datasets are used during training) and cross-dataset adaptation protocols, establishing the possibility of a generalised audio few-shot classification algorithm. Our experimentation shows gradient-based meta-learning methods such as MAML and Meta-Curvature consistently outperform both metric and baseline methods. We also demonstrate that the joint training routine helps overall generalisation for the environmental sound databases included, as well as being a somewhat-effective method of tackling the cross-dataset/domain setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题