论文标题
听您想要的:基于神经网络的通用声音选择器
Listen to What You Want: Neural Network-based Universal Sound Selector
论文作者
论文摘要
能够控制我们要聆听的声学事件(AE)将允许开发更可控制的听觉设备。本文解决了AE声音选择(或删除)问题,我们将其定义为属于一个或多个所需AE类的所有声音的提取(或抑制)。尽管可以通过源分离然后进行AE分类的结合来解决此问题,但这是解决该问题的一种亚最佳方法。此外,源分离通常需要知道最大数量的来源,而在处理AE时可能不切实际。在本文中,我们提出了一个通用声音选择神经网络,该网络可以直接从给定用户指定的目标AE类中从混合物中选择AE声音。可以明确优化所提出的框架,以同时从多个所需的AE类中选择声音,而不是混合物中的源数。我们通过实验表明,所提出的方法实现了有希望的AE声音选择性能,并且可以推广到与许多在训练过程中看不见的来源的混合物。
Being able to control the acoustic events (AEs) to which we want to listen would allow the development of more controllable hearable devices. This paper addresses the AE sound selection (or removal) problems, that we define as the extraction (or suppression) of all the sounds that belong to one or multiple desired AE classes. Although this problem could be addressed with a combination of source separation followed by AE classification, this is a sub-optimal way of solving the problem. Moreover, source separation usually requires knowing the maximum number of sources, which may not be practical when dealing with AEs. In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes. The proposed framework can be explicitly optimized to simultaneously select sounds from multiple desired AE classes, independently of the number of sources in the mixture. We experimentally show that the proposed method achieves promising AE sound selection performance and could be generalized to mixtures with a number of sources that are unseen during training.