带有GCC-PHAT功能的BATVISION，可以更好地进行视觉预测

论文标题

带有GCC-PHAT功能的BATVISION，可以更好地进行视觉预测

BatVision with GCC-PHAT Features for Better Sound to Vision Predictions

论文作者

Christensen, Jesper Haahr, Hornauer, Sascha, Yu, Stella

论文摘要

受自然界中的复杂回声定位能力的启发，我们训练了一个生成的对抗网络，以预测声音中合理的深度图和灰度布局。为了实现这一目标，我们的声音到视觉模型从chi声音中处理双耳回声返回。我们以BATVISION的作品为基础，该工作由使用移动机器人和低成本硬件组成的声音到视觉模型和自收集的数据集组成。我们通过引入模型的几个更改来改善先前的模型，这会导致更深入和灰度估计，并提高感知质量。我们没有使用原始双耳波形作为输入，而是生成广义的互相关（GCC）特征，而是将其用作输入。此外，我们更改模型发生器，并基于剩余学习，并在歧视器中使用光谱归一化。我们比较并介绍了与以前的Batvision模型相比的定量和定性改进。

Inspired by sophisticated echolocation abilities found in nature, we train a generative adversarial network to predict plausible depth maps and grayscale layouts from sound. To achieve this, our sound-to-vision model processes binaural echo-returns from chirping sounds. We build upon previous work with BatVision that consists of a sound-to-vision model and a self-collected dataset using our mobile robot and low-cost hardware. We improve on the previous model by introducing several changes to the model, which leads to a better depth and grayscale estimation, and increased perceptual quality. Rather than using raw binaural waveforms as input, we generate generalized cross-correlation (GCC) features and use these as input instead. In addition, we change the model generator and base it on residual learning and use spectral normalization in the discriminator. We compare and present both quantitative and qualitative improvements over our previous BatVision model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题