深入的强化学习，用于策划Vrans的成本感知重新配置

论文标题

深入的强化学习，用于策划Vrans的成本感知重新配置

Deep Reinforcement Learning for Orchestrating Cost-Aware Reconfigurations of vRANs

论文作者

Murti, Fahri Wisnu, Ali, Samad, Iosifidis, George, Latva-aho, Matti

论文摘要

虚拟化无线电访问网络（VRAN）是完全可配置的，可以以低成本的价格实现商品平台，以实现网络管理灵活性。在本文中，提出了一个新颖的VRAN重新配置问题，以共同重新配置基站（BSS）的功能分裂，虚拟化中心单位（VCU）的位置和分布式单位（VDU）（VDU），其资源以及每个BS数据流的路由。目的是最大程度地降低长期的总网络运营成本，同时适应不同的交通需求和资源可用性。进行测试床测量以研究交通需求与计算资源之间的关系，这揭示了较高的差异，并取决于平台及其负载。因此，找到基础系统的完美模型是不平凡的。因此，为了解决提出的问题，使用无模型的RL方法提出并开发了基于深入的加固学习（RL）的框架。此外，该问题由多个共享相同资源的BSS组成，这导致了多维离散动作空间，并导致了一系列可能的操作。为了克服这一维度的诅咒，动作分支体系结构，这是一种具有共享决策模块的动作分解方法，随后是神经网络，与Duel Double Deep Q-Network（D3QN）算法结合使用。使用符合O-Ran的模型和测试床的真实痕迹进行仿真。我们的数值结果表明，所提出的框架成功地学习了自适应选择VRAN配置的最佳策略，在此，即使在不同的VRAN系统中，也可以通过转移学习来进一步加快其学习收敛性。它可节省大量的静态基准的59％，占DDPG的35％，而非分支D3QN的76％\％。

Virtualized Radio Access Networks (vRANs) are fully configurable and can be implemented at a low cost over commodity platforms to enable network management flexibility. In this paper, a novel vRAN reconfiguration problem is formulated to jointly reconfigure the functional splits of the base stations (BSs), locations of the virtualized central units (vCUs) and distributed units (vDUs), their resources, and the routing for each BS data flow. The objective is to minimize the long-term total network operation cost while adapting to the varying traffic demands and resource availability. Testbed measurements are performed to study the relationship between the traffic demands and computing resources, which reveals high variance and depends on the platform and its load. Consequently, finding the perfect model of the underlying system is non-trivial. Therefore, to solve the proposed problem, a deep reinforcement learning (RL)-based framework is proposed and developed using model-free RL approaches. Moreover, the problem consists of multiple BSs sharing the same resources, which results in a multi-dimensional discrete action space and leads to a combinatorial number of possible actions. To overcome this curse of dimensionality, action branching architecture, which is an action decomposition method with a shared decision module followed by neural network is combined with Dueling Double Deep Q-network (D3QN) algorithm. Simulations are carried out using an O-RAN compliant model and real traces of the testbed. Our numerical results show that the proposed framework successfully learns the optimal policy that adaptively selects the vRAN configurations, where its learning convergence can be further expedited through transfer learning even in different vRAN systems. It offers significant cost savings by up to 59\% of a static benchmark, 35\% of DDPG with discretization, and 76\% of non-branching D3QN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题