论文标题
强化学习和单位承诺问题的树木搜索方法
Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem
论文作者
论文摘要
单位承诺(UC)问题确定生成单元的运行时间表以满足需求,这是电力系统操作中的一项基本任务。现有的UC方法使用混合企业编程不适合高度随机系统。更严格地解释不确定性的方法可以通过降低旋转储备要求来大大降低运营成本;效率更高的运营电站;并整合了更多可变可再生能源的数量。解决UC问题的一种有希望的方法是加强学习(RL),这是一种最佳决策方法,该方法已被用来征服人工智能中长期存在的宏伟挑战。本文探讨了RL在UC问题中的应用,并解决了包括不确定性下的鲁棒性的挑战;多个问题实例的普遍性;比以前研究的更大的功率系统扩展到更大的功率系统。为了解决这些问题,我们开发了指导树搜索,这是一种结合了无模型RL和基于模型的计划的新颖方法。 UC问题被正式化为马尔可夫决策过程,我们根据大不列颠电力系统的真实数据开发开源环境来培训RL代理。在多达100个发电机的问题中,指导树搜索与确定性UC方法具有竞争力,可将运营成本降低1.4 \%。 RL的一个优点是,该框架可以轻松扩展,以结合对电力系统运营商重要的考虑因素,例如对发电机故障,降低降低或碳价格的稳健性。当考虑发电机的中断时,与使用常规$ N-X $储备标准相比,指导树搜索的运营成本超过2 \%。
The unit commitment (UC) problem, which determines operating schedules of generation units to meet demand, is a fundamental task in power systems operation. Existing UC methods using mixed-integer programming are not well-suited to highly stochastic systems. Approaches which more rigorously account for uncertainty could yield large reductions in operating costs by reducing spinning reserve requirements; operating power stations at higher efficiencies; and integrating greater volumes of variable renewables. A promising approach to solving the UC problem is reinforcement learning (RL), a methodology for optimal decision-making which has been used to conquer long-standing grand challenges in artificial intelligence. This thesis explores the application of RL to the UC problem and addresses challenges including robustness under uncertainty; generalisability across multiple problem instances; and scaling to larger power systems than previously studied. To tackle these issues, we develop guided tree search, a novel methodology combining model-free RL and model-based planning. The UC problem is formalised as a Markov decision process and we develop an open-source environment based on real data from Great Britain's power system to train RL agents. In problems of up to 100 generators, guided tree search is shown to be competitive with deterministic UC methods, reducing operating costs by up to 1.4\%. An advantage of RL is that the framework can be easily extended to incorporate considerations important to power systems operators such as robustness to generator failure, wind curtailment or carbon prices. When generator outages are considered, guided tree search saves over 2\% in operating costs as compared with methods using conventional $N-x$ reserve criteria.