论文标题
Emotionnas:两流神经架构寻找语音情感识别
EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition
论文作者
论文摘要
语音情绪识别(SER)是人类计算机互动中的重要研究主题。现有作品主要依靠人类专业知识来设计模型。尽管他们成功了,但不同的数据集通常需要不同的结构和超参数。为每个数据集寻找一个最佳模型是耗时且劳动力密集的。为了解决这个问题,我们提出了一个基于两流的神经体系结构搜索(NAS)框架,称为\ enquote {emotionnas}。具体而言,我们将两流特征(即手工制作和深度特征)作为输入,然后是NAS,以搜索每个流的最佳结构。此外,我们通过有效的信息补充模块将互补信息纳入不同的流中。实验结果表明,我们的方法的表现优于现有的手动设计和基于NAS的模型,从而设定了新的最新记录。
Speech emotion recognition (SER) is an important research topic in human-computer interaction. Existing works mainly rely on human expertise to design models. Despite their success, different datasets often require distinct structures and hyperparameters. Searching for an optimal model for each dataset is time-consuming and labor-intensive. To address this problem, we propose a two-stream neural architecture search (NAS) based framework, called \enquote{EmotionNAS}. Specifically, we take two-stream features (i.e., handcrafted and deep features) as the inputs, followed by NAS to search for the optimal structure for each stream. Furthermore, we incorporate complementary information in different streams through an efficient information supplement module. Experimental results demonstrate that our method outperforms existing manually-designed and NAS-based models, setting the new state-of-the-art record.