论文标题

EGO4D挑战2022的英特尔实验室:视听诊断的更好基准

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

论文作者

Min, Kyle

论文摘要

本报告描述了我们对2022年EGO4D挑战的视听诊断(AVD)任务的方法。具体来说,我们对官方基线进行了多种技术改进。首先,我们通过修改其模型的训练方案来提高相机佩戴者的语音活动的检测性能。其次,我们发现,现成的语音活动检测模型仅将其应用于相机佩戴者的语音活动时,可以有效地消除误报。最后,我们表明更好的主动扬声器检测会带来更好的AVD结果。我们的最终方法在EGO4D的测试集上获得了65.9%的DER,这极大地超过了所有基准。我们的提交在2022年的EGO4D挑战赛中获得了第一名。

This report describes our approach for the Audio-Visual Diarization (AVD) task of the Ego4D Challenge 2022. Specifically, we present multiple technical improvements over the official baselines. First, we improve the detection performance of the camera wearer's voice activity by modifying the training scheme of its model. Second, we discover that an off-the-shelf voice activity detection model can effectively remove false positives when it is applied solely to the camera wearer's voice activities. Lastly, we show that better active speaker detection leads to a better AVD outcome. Our final method obtains 65.9% DER on the test set of Ego4D, which significantly outperforms all the baselines. Our submission achieved 1st place in the Ego4D Challenge 2022.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源