使用卷积复发的神经网络，来自语音和音乐信号的公开盲人房间的声学表征

论文标题

使用卷积复发的神经网络，来自语音和音乐信号的公开盲人房间的声学表征

Joint Blind Room Acoustic Characterization From Speech And Music Signals Using Convolutional Recurrent Neural Networks

论文作者

Callens, Paul, Cernak, Milos

论文摘要

声学环境表征为声音繁殖创新，智能等级，言语增强，助听器和取证打开了大门。混响时间，清晰度和直接抗争比率是定义用于描述回响环境的声学参数。它们与语音清晰度和声音质量密切相关。如ISO3382标准中所述，它们可以源自称为房间脉冲响应（RIR）的房间测量值。但是，测量RIR需要特定的设备和侵入性声音。最近的音频与机器学习相结合表明，可以使用语音或音乐信号盲目估算这些参数。我们遵循这些进步，并提出了一种强大的端到端方法，以使用语音和/或音乐信号来实现盲关节声学参数估计。我们的结果表明，卷积复发性神经网络在此任务中表现最佳，并且包括训练中的音乐也有助于改善语音的推论。

Acoustic environment characterization opens doors for sound reproduction innovations, smart EQing, speech enhancement, hearing aids, and forensics. Reverberation time, clarity, and direct-to-reverberant ratio are acoustic parameters that have been defined to describe reverberant environments. They are closely related to speech intelligibility and sound quality. As explained in the ISO3382 standard, they can be derived from a room measurement called the Room Impulse Response (RIR). However, measuring RIRs requires specific equipment and intrusive sound to be played. The recent audio combined with machine learning suggests that one could estimate those parameters blindly using speech or music signals. We follow these advances and propose a robust end-to-end method to achieve blind joint acoustic parameter estimation using speech and/or music signals. Our results indicate that convolutional recurrent neural networks perform best for this task, and including music in training also helps improve inference from speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题