音频到图像生成的跨模式对比度表示

论文标题

音频到图像生成的跨模式对比度表示

Cross-Modal Contrastive Representation Learning for Audio-to-Image Generation

论文作者

Chung, HaeChun, Shim, JooYong, Kim, Jong-Kook

论文摘要

某些信息的多种方式提供了有关该信息的各种观点，可以改善对信息的理解。因此，生成与现有数据不同模态的数据以增强理解可能至关重要。在本文中，我们研究了跨模式音频到图像生成问题，并提出了跨模式对比度表示学习（CMCRL），以从音频中提取有用的特征并在生成阶段使用它。实验结果表明，与以前的研究相比，CMCRL提高了产生的图像质量。

Multiple modalities for certain information provide a variety of perspectives on that information, which can improve the understanding of the information. Thus, it may be crucial to generate data of different modality from the existing data to enhance the understanding. In this paper, we investigate the cross-modal audio-to-image generation problem and propose Cross-Modal Contrastive Representation Learning (CMCRL) to extract useful features from audios and use it in the generation phase. Experimental results show that CMCRL enhances quality of images generated than previous research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题