可变形的CNN和不平衡感知功能学习用于唱歌技术分类

论文标题

可变形的CNN和不平衡感知功能学习用于唱歌技术分类

Deformable CNN and Imbalance-Aware Feature Learning for Singing Technique Classification

论文作者

Yamamoto, Yuya, Nam, Juhan, Terasawa, Hiroko

论文摘要

唱歌技术通过采用音色，音调和其他声音的其他组成部分的时间波动，用于表达声音表演。他们的分类是一项具有挑战性的任务，因为主要有两个因素：1）唱歌技术的波动多种多样，受许多因素的影响和2）现有数据集不平衡。为了解决这些问题，我们使用类突出的损失函数开发了一种基于可变形卷积的新型音频特征学习方法，基于可变形卷积和分类器的脱钩训练。实验结果显示以下内容：1）可变形卷积改善了分类结果，尤其是在将其应用于最后两个卷积层时，以及2）重新训练分类器并通过平滑的逆频率加权交叉渗透损失函数，从而增强了分类性能。

Singing techniques are used for expressive vocal performances by employing temporal fluctuations of the timbre, the pitch, and other components of the voice. Their classification is a challenging task, because of mainly two factors: 1) the fluctuations in singing techniques have a wide variety and are affected by many factors and 2) existing datasets are imbalanced. To deal with these problems, we developed a novel audio feature learning method based on deformable convolution with decoupled training of the feature extractor and the classifier using a class-weighted loss function. The experimental results show the following: 1) the deformable convolution improves the classification results, particularly when it is applied to the last two convolutional layers, and 2) both re-training the classifier and weighting the cross-entropy loss function by a smoothed inverse frequency enhance the classification performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题