论文标题
从视听协会中学习视觉样式
Learning Visual Styles from Audio-Visual Associations
论文作者
论文摘要
从雨的氛围到雪的紧缩,我们听到的声音经常传达出场景中出现的视觉纹理。在本文中,我们提出了一种从未标记的音频数据中学习视觉样式的方法。我们的模型学会了操纵场景的纹理以匹配声音,这是我们称为音频驱动的图像样式的问题。给定配对的音频数据数据集,我们学会了修改输入图像,以便在操纵后,它们更有可能与给定的输入声音共同发生。在定量和定性评估中,我们的基于声音的模型优于基于标签的方法。我们还表明,音频可以是操纵图像的直观表示形式,因为调整声音的音量或将两个声音混合在一起会导致可预测的视觉样式变化。项目网页:https://tinglok.netlify.app/files/avstyle
From the patter of rain to the crunch of snow, the sounds we hear often convey the visual textures that appear within a scene. In this paper, we present a method for learning visual styles from unlabeled audio-visual data. Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization. Given a dataset of paired audio-visual data, we learn to modify input images such that, after manipulation, they are more likely to co-occur with a given input sound. In quantitative and qualitative evaluations, our sound-based model outperforms label-based approaches. We also show that audio can be an intuitive representation for manipulating images, as adjusting a sound's volume or mixing two sounds together results in predictable changes to visual style. Project webpage: https://tinglok.netlify.app/files/avstyle