论文标题
基于混合CNN转换器和非本地交叉模式注意的多模式图像融合
Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local Cross-modal Attention
论文作者
论文摘要
异质传感器拍摄的图像的融合有助于丰富信息并提高成像质量。在本文中,我们提出了一个混合模型,该模型由卷积编码器和基于变压器的解码器组成,用于融合多模式图像。在编码器中,提出了一个非本地跨模式注意块,以捕获多个源图像的局部和全局依赖性。一个分支融合模块设计用于适应两个分支的特征。我们在解码器中嵌入了具有线性复杂性的变压器模块,以增强所提出的网络的重建能力。定性和定量实验通过将其与现有最新融合模型进行比较,证明了该方法的有效性。我们工作的源代码可从https://github.com/pandayuanyu/hcfusion获得。
The fusion of images taken by heterogeneous sensors helps to enrich the information and improve the quality of imaging. In this article, we present a hybrid model consisting of a convolutional encoder and a Transformer-based decoder to fuse multimodal images. In the encoder, a non-local cross-modal attention block is proposed to capture both local and global dependencies of multiple source images. A branch fusion module is designed to adaptively fuse the features of the two branches. We embed a Transformer module with linear complexity in the decoder to enhance the reconstruction capability of the proposed network. Qualitative and quantitative experiments demonstrate the effectiveness of the proposed method by comparing it with existing state-of-the-art fusion models. The source code of our work is available at https://github.com/pandayuanyu/HCFusion.