论文标题
Lyapunov优化的强化学习公式:应用队列稳定性的边缘计算系统
A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability
论文作者
论文摘要
在本文中,考虑到基于深厚的增强学习(DRL)的方法,可以考虑在保持队列稳定性的同时最大程度地减少时间平均罚款。提供了适当的状态和行动空间的构造,以形成马尔可夫决策过程(MDP)进行优化。得出了加固学习奖励功能(RL)的队列稳定性条件。根据奖励折扣的分析和实用RL,为Lyapunov优化的基于DRL的方法提出了一类奖励功能。提出的基于DRL的方法对Lyapunov优化的方法不需要在每个时间步骤中进行复杂的优化,并且可以使用一般的非凸和不连续的惩罚函数运行。因此,它为Lyapunov优化提供了常规漂移加度(DPP)算法的替代方法。提出的基于DRL的方法应用于具有队列稳定性的边缘计算系统中的资源分配,数值结果证明了其成功的操作。
In this paper, a deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability. A proper construction of state and action spaces is provided to form a proper Markov decision process (MDP) for the Lyapunov optimization. A condition for the reward function of reinforcement learning (RL) for queue stability is derived. Based on the analysis and practical RL with reward discounting, a class of reward functions is proposed for the DRL-based approach to the Lyapunov optimization. The proposed DRL-based approach to the Lyapunov optimization does not required complicated optimization at each time step and operates with general non-convex and discontinuous penalty functions. Hence, it provides an alternative to the conventional drift-plus-penalty (DPP) algorithm for the Lyapunov optimization. The proposed DRL-based approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.