论文标题
TASAC:带有随机策略的双动物加固学习框架用于批处理过程
TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control
论文作者
论文摘要
由于其复杂的非线性动力学和批处理变异性,批处理过程对过程控制构成了挑战。由于没有准确的模型和由此产生的植物模型不匹配,这些问题变得更难解决基于高级模型的控制策略。强化学习(RL),其中代理通过直接与环境互动来学习政策,在这种情况下提供了潜在的替代方案。带有演员批判性架构的RL框架最近在控制状态和行动空间是连续的系统中变得流行。已经表明,由于同时的政策学习,由于加强勘探的增强,演员和评论家网络的合奏进一步帮助代理人学习更好的政策。为此,当前的研究提出了一种随机参与者批评的RL算法,称为双星运动员软批评者(TASAC),通过在最大的熵框架中纳入一个用于学习的参与者,以进行批处理过程控制。
Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address for advanced model-based control strategies. Reinforcement Learning (RL), wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context. RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous. It has been shown that an ensemble of actor and critic networks further helps the agent learn better policies due to the enhanced exploration due to simultaneous policy learning. To this end, the current study proposes a stochastic actor-critic RL algorithm, termed Twin Actor Soft Actor-Critic (TASAC), by incorporating an ensemble of actors for learning, in a maximum entropy framework, for batch process control.