通过深度加固学习的无线多播系统的调度和功率控制

论文标题

通过深度加固学习的无线多播系统的调度和功率控制

Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning

论文作者

Raghu, Ramkumar, Panju, Mahadesh, Aggarwal, Vaneet, Sharma, Vinod

论文摘要

无线系统中的多播是利用以内容为中心网络中用户请求中冗余的自然方法。电源控制和最佳调度可以显着改善无线多播网络在淡出下的性能。但是，前面研究的基于模型的电源控制和调度方法无法扩展到大型状态空间或更改系统动力学。在本文中，我们使用深度强化学习，其中我们使用深层神经网络使用Q功能的功能近似来获得与小型网络最佳策略相匹配的功率控制策略。我们表明，可以通过这种方法为合理大型系统学习电力控制策略。此外，我们使用多时间的随机优化来维持平均功率限制。我们证明，学习算法的稍作修改允许跟踪随时间变化的系统统计信息。最后，我们扩展了多时间尺度的方法，以同时学习最佳排队策略以及功率控制。我们通过模拟演示了我们算法的可扩展性，跟踪和跨层优化功能。所提出的多时间尺度方法可以在具有多个目标和约束的一般大型状态空间动力学系统中使用，并且可能具有独立的兴趣。

Multicasting in wireless systems is a natural way to exploit the redundancy in user requests in a Content Centric Network. Power control and optimal scheduling can significantly improve the wireless multicast network's performance under fading. However, the model based approaches for power control and scheduling studied earlier are not scalable to large state space or changing system dynamics. In this paper, we use deep reinforcement learning where we use function approximation of the Q-function via a deep neural network to obtain a power control policy that matches the optimal policy for a small network. We show that power control policy can be learnt for reasonably large systems via this approach. Further we use multi-timescale stochastic optimization to maintain the average power constraint. We demonstrate that a slight modification of the learning algorithm allows tracking of time varying system statistics. Finally, we extend the multi-timescale approach to simultaneously learn the optimal queueing strategy along with power control. We demonstrate scalability, tracking and cross layer optimization capabilities of our algorithms via simulations. The proposed multi-timescale approach can be used in general large state space dynamical systems with multiple objectives and constraints, and may be of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题