迈向语义沟通：基于深度学习的图像语义编码

论文标题

迈向语义沟通：基于深度学习的图像语义编码

Towards Semantic Communications: Deep Learning-Based Image Semantic Coding

论文作者

Huang, Danlan, Gao, Feifei, Tao, Xiaoming, Du, Qiyuan, Lu, Jianhua

论文摘要

语义通信引起了人们的兴趣，因为它可以显着减少在不丢失关键信息的情况下要传输的数据量。大多数现有作品都探索文本的语义编码和传输，并在自然语言处理（NLP）中应用技术来解释文本的含义。在本文中，我们构想了图像数据的语义通信，这些语义数据在语义和带宽敏感方面更为丰富。我们提出了一种基于增强学习的自适应语义编码（RL-ASC）方法，该方法编码超过像素级别的图像。首先，我们定义了图像数据的语义概念，其中包括类别，空间布置和视觉特征作为表示单元，并提出了卷积语义编码器以提取语义概念。其次，我们提出了图像重建标准，该标准从传统的像素相似性演变为语义相似性和感知性能。第三，我们设计了一种基于RL的新型语义位分配模型，其奖励是用自适应量化水平编码某个语义概念后的速率声音感知性能的提高。因此，与任务相关的信息保留并重建正确，而丢弃较少重要的数据。最后，我们提出了基于生成的对抗网（GAN）的语义解码器，该语义解码器通过注意模块在本地和全球特征融合。实验结果表明，所提出的RL-ASC是噪声稳定的，可以重建视觉上令人愉悦和语义一致的图像，并节省与标准编解码器和其他基于深度学习的图像编解码器相比，可以节省一些时间成本。

Semantic communications has received growing interest since it can remarkably reduce the amount of data to be transmitted without missing critical information. Most existing works explore the semantic encoding and transmission for text and apply techniques in Natural Language Processing (NLP) to interpret the meaning of the text. In this paper, we conceive the semantic communications for image data that is much more richer in semantics and bandwidth sensitive. We propose an reinforcement learning based adaptive semantic coding (RL-ASC) approach that encodes images beyond pixel level. Firstly, we define the semantic concept of image data that includes the category, spatial arrangement, and visual feature as the representation unit, and propose a convolutional semantic encoder to extract semantic concepts. Secondly, we propose the image reconstruction criterion that evolves from the traditional pixel similarity to semantic similarity and perceptual performance. Thirdly, we design a novel RL-based semantic bit allocation model, whose reward is the increase in rate-semantic-perceptual performance after encoding a certain semantic concept with adaptive quantization level. Thus, the task-related information is preserved and reconstructed properly while less important data is discarded. Finally, we propose the Generative Adversarial Nets (GANs) based semantic decoder that fuses both locally and globally features via an attention module. Experimental results demonstrate that the proposed RL-ASC is noise robust and could reconstruct visually pleasant and semantic consistent image, and saves times of bit cost compared to standard codecs and other deep learning-based image codecs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题