论文标题

分配偏移中的Wakeword检测

Wakeword Detection under Distribution Shifts

论文作者

Parthasarathi, Sree Hari Krishnan, Zeng, Lu, Jose, Christin, Wang, Joseph

论文摘要

我们为半监督学习(SSL)提出了一种新颖的方法,旨在克服训练和现实世界数据之间的分布变化(KWS)任务。从训练数据分布的转变是现实世界中KWS任务的关键挑战:当在设备上部署新模型时,所接受数据的门控经历了分布的变化,从而使及时更新的问题通过后续部署进行了努力。尽管发生了变化,我们假设标签上的边际分布不会改变。我们利用一个修改后的教师/学生培训框架,在该框架中使用未标记的数据增强了标记的培训数据。请注意,教师也无法访问新分布。为了通过人类和教师标记的数据进行有效培训,我们根据信心启发式制定了教师标签策略,以减少教师模型的标签分布的熵;然后对数据进行采样以匹配标签上的边际分布。大规模实验结果表明,在远场音频上训练的卷积神经网络(CNN),并在远场音频中进行了评估,从不同分布中得出的远场音频,以相等的虚假拒绝率(FRR)获得了14.3%的相对相对提高(FRR),同时fdr的分布分布不转移5%。在从远场到近场音频的更严重的分布变化下,我们的方法在FRR时的FDR相对改善了52%,同时在原始分布上的FDR相对相对提高了20%。

We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源