用固定的WAV2VEC 2.0预测情感声乐爆发

论文标题

用固定的WAV2VEC 2.0预测情感声乐爆发

Predicting Affective Vocal Bursts with Finetuned wav2vec 2.0

论文作者

Atmaja, Bagus Tris, Sasou, Akira

论文摘要

从人类声音中预测情感状态的研究严重依赖语音。的确，这项研究探讨了人声爆发对人类情感状态的认识，这是一种简短的非语言发声。从WAV2VEC 2.0的最新成功中借用了这个想法，我们评估了来自不同数据集的FineTuned Wav2Vec 2.0模型，以预测说话者的声音爆发的情感状态。然后，对声音爆发数据进行训练，对FINET的WAV2VEC 2.0型号进行训练。结果表明，易经的WAV2VEC 2.0模型，尤其是在情感语音数据集上，优于基线模型，该模型是手工制作的声学特征。但是，在非影响语音数据集上填充的模型与情感语音数据集之间没有很大的差距。

The studies of predicting affective states from human voices have relied heavily on speech. This study, indeed, explores the recognition of humans' affective state from their vocal burst, a short non-verbal vocalization. Borrowing the idea from the recent success of wav2vec 2.0, we evaluated finetuned wav2vec 2.0 models from different datasets to predict the affective state of the speaker from their vocal burst. The finetuned wav2vec 2.0 models are then trained on the vocal burst data. The results show that the finetuned wav2vec 2.0 models, particularly on an affective speech dataset, outperform the baseline model, which is handcrafted acoustic features. However, there is no large gap between the model finetuned on non-affective speech dataset and affective speech dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题