动态层自定义，以噪声强大的语音情感识别在异质状态训练中

论文标题

动态层自定义，以噪声强大的语音情感识别在异质状态训练中

Dynamic Layer Customization for Noise Robust Speech Emotion Recognition in Heterogeneous Condition Training

论文作者

Wilf, Alex, Provost, Emily Mower

论文摘要

对环境噪声的鲁棒性对于创建可在现实世界中可部署的自动语音情感识别系统很重要。先前关于噪声鲁棒性的工作假设系统不会利用样本训练噪声条件，或者他们可以访问未标记的测试数据以跨越噪声条件进行概括。我们避免使用这些假设，并将结果任务作为异质疾病培训。我们表明，借助对测试噪声条件的全面知识，我们可以通过将样品动态路由到每种噪声条件的专用特征编码器来提高性能，并且有了部分知识，我们可以使用已知的噪声条件和域的适应算法来训练系统，以训练以良好的噪声条件进行概括。然后，我们通过动态路由样品来维持时间顺序，将这些改进扩展到多模式设置，从而对不专注或基于噪声类型进行专注或推广的方法进行了显着改进。

Robustness to environmental noise is important to creating automatic speech emotion recognition systems that are deployable in the real world. Prior work on noise robustness has assumed that systems would not make use of sample-by-sample training noise conditions, or that they would have access to unlabelled testing data to generalize across noise conditions. We avoid these assumptions and introduce the resulting task as heterogeneous condition training. We show that with full knowledge of the test noise conditions, we can improve performance by dynamically routing samples to specialized feature encoders for each noise condition, and with partial knowledge, we can use known noise conditions and domain adaptation algorithms to train systems that generalize well to unseen noise conditions. We then extend these improvements to the multimodal setting by dynamically routing samples to maintain temporal ordering, resulting in significant improvements over approaches that do not specialize or generalize based on noise type.

下载PDF全文

下载文献需遵守相关版权规定

论文标题