轻量级神经网络具有CSI反馈的知识蒸馏

论文标题

轻量级神经网络具有CSI反馈的知识蒸馏

Lightweight Neural Network with Knowledge Distillation for CSI Feedback

论文作者

Cui, Yiming, Guo, Jiajia, Cao, Zheng, Tang, Huaze, Wen, Chao-Kai, Jin, Shi, Wang, Xin, Hou, Xiaolin

论文摘要

深度学习已经显示出增强渠道状态信息（CSI）反馈的希望。但是，许多研究表明，更好的反馈性能通常伴随着更高的计算复杂性。追求更好的性能复杂性权衡对于促进实际部署至关重要，尤其是在计算有限的设备上，这可能必须使用具有不利性能的轻型自动编码器。为了实现这一目标，本文介绍了知识蒸馏（KD），以实现更好的权衡，其中从复杂的教师自动编码器中的知识转移到了轻量级的学生自动编码器以提高绩效。具体而言，提出了两种用于实施的方法。首先，通过训练学生自动编码器模仿预验证的教师自动编码器的重建CSI来引入基于自动编码器KD的方法。其次，提出了一种基于编码器KD的方法，以通过仅在学生编码器上执行KD来减少训练开销。此外，引入了编码器KD的变体，以保护用户设备和基站供应商知识产权。数值模拟表明，所提出的方法可以显着提高学生自动编码器的性能，同时将浮点操作的数量和推理时间减少到3.05％-5.28％和13.80％-14.76％的教师网络。此外，变体编码器KD方法有效地增强了学生自动编码器在不同情况，环境和带宽方面的概括能力。

Deep learning has shown promise in enhancing channel state information (CSI) feedback. However, many studies indicate that better feedback performance often accompanies higher computational complexity. Pursuing better performance-complexity tradeoffs is crucial to facilitate practical deployment, especially on computation-limited devices, which may have to use lightweight autoencoder with unfavorable performance. To achieve this goal, this paper introduces knowledge distillation (KD) to achieve better tradeoffs, where knowledge from a complicated teacher autoencoder is transferred to a lightweight student autoencoder for performance improvement. Specifically, two methods are proposed for implementation. Firstly, an autoencoder KD-based method is introduced by training a student autoencoder to mimic the reconstructed CSI of a pretrained teacher autoencoder. Secondly, an encoder KD-based method is proposed to reduce training overhead by performing KD only on the student encoder. Additionally, a variant of encoder KD is introduced to protect user equipment and base station vendor intellectual property. Numerical simulations demonstrate that the proposed methods can significantly improve the student autoencoder's performance, while reducing the number of floating point operations and inference time to 3.05%-5.28% and 13.80%-14.76% of the teacher network, respectively. Furthermore, the variant encoder KD method effectively enhances the student autoencoder's generalization capability across different scenarios, environments, and bandwidths.

下载PDF全文

下载文献需遵守相关版权规定

论文标题