论文标题
罗斯:利用机器人旋转进行音频源分离
RoSS: Utilizing Robotic Rotation for Audio Source Separation
论文作者
论文摘要
本文考虑了音频源分离的问题,目的是将目标音频信号(例如爱丽丝的语音)与多个干扰信号的混合物(例如,当许多人在说话时)的混合物。这个问题引起了人们的兴趣,这主要是由于语音控制设备的显着增长,包括房屋,办公室和其他公共设施的机器人。尽管在源分离的核心主题上存在着丰富的工作,但我们发现麦克风的机器人运动(例如机器人的头)是过去方法的补充机会。简而言之,我们表明将麦克风阵列旋转到正确的方向可能会在两个干扰物之间产生所需的混叠,从而导致两个干扰物构成一个。换句话说,k信号的混合物变成了(K-1)的混合物,这是一种数学混凝土的增益。我们证明,增加的增益可以很好地转化为实践,提供两个与移动性相关的挑战可以缓解。本文的重点是缓解这些挑战,并在功能齐全的原型上展示了端到端的性能。我们认为,我们的旋转源分离模块Ross可以插入实际的机器人头部,也可以插入也能够旋转的其他设备(例如Amazon Show)中。
This paper considers the problem of audio source separation where the goal is to isolate a target audio signal (say Alice's speech) from a mixture of multiple interfering signals (e.g., when many people are talking). This problem has gained renewed interest mainly due to the significant growth in voice controlled devices, including robots in homes, offices, and other public facilities. Although a rich body of work exists on the core topic of source separation, we find that robotic motion of the microphone -- say the robot's head -- is a complementary opportunity to past approaches. Briefly, we show that rotating the microphone array to the correct orientation can produce desired aliasing between two interferers, causing the two interferers to pose as one. In other words, a mixture of K signals becomes a mixture of (K-1), a mathematically concrete gain. We show that the gain translates well to practice provided two mobility-related challenges can be mitigated. This paper is focused on mitigating these challenges and demonstrating the end-to-end performance on a fully functional prototype. We believe that our Rotational Source Separation module RoSS could be plugged into actual robot heads, or into other devices (like Amazon Show) that are also capable of rotation.