通过增强学习和模仿学习优化作物管理

论文标题

通过增强学习和模仿学习优化作物管理

Optimizing Crop Management with Reinforcement Learning and Imitation Learning

论文作者

Tao, Ran, Zhao, Pan, Wu, Jing, Martin, Nicolas F., Harrison, Matthew T., Ferreira, Carla, Kalantari, Zahra, Hovakimyan, Naira

论文摘要

农作物管理，包括氮（N）受精和灌溉管理，对农作物产量，经济利润和环境产生了重大影响。尽管存在管理指南，但要在特定的种植环境和农作物中找到最佳的管理实践是一个挑战。先前的工作使用加强学习（RL）和作物模拟器来解决该问题，但是训练有素的政策要么具有有限的性能，要么在现实世界中不可部署。在本文中，我们提出了一种智能的作物管理系统，该系统通过RL，模仿学习（IL）同时优化N受精和灌溉，并使用农业技术的决策支持系统（DSSAT）进行了作物模拟。我们首先使用Deep RL，尤其是Deep Q-Network来培训需要从模拟器中的所有状态信息作为观测值（表示为完整观察）的管理政策。然后，我们调用IL来培训管理政策，这些政策只需要有限的国家信息，这些信息可以在现实世界中很容易获得（表示为部分观察），即通过模仿完整观察的先前经过RL训练的政策的行为。我们在佛罗里达州使用玉米进行案例研究进行实验，并将受过训练的政策与玉米管理指南进行比较。我们在完整和部分观察结果下训练有素的政策取得了更好的结果，从而获得更高的利润或类似的利润，而环境影响较小。此外，部分观察管理政策在使用易于可用的信息时直接在现实世界中部署。

Crop management, including nitrogen (N) fertilization and irrigation management, has a significant impact on the crop yield, economic profit, and the environment. Although management guidelines exist, it is challenging to find the optimal management practices given a specific planting environment and a crop. Previous work used reinforcement learning (RL) and crop simulators to solve the problem, but the trained policies either have limited performance or are not deployable in the real world. In this paper, we present an intelligent crop management system which optimizes the N fertilization and irrigation simultaneously via RL, imitation learning (IL), and crop simulations using the Decision Support System for Agrotechnology Transfer (DSSAT). We first use deep RL, in particular, deep Q-network, to train management policies that require all state information from the simulator as observations (denoted as full observation). We then invoke IL to train management policies that only need a limited amount of state information that can be readily obtained in the real world (denoted as partial observation) by mimicking the actions of the previously RL-trained policies under full observation. We conduct experiments on a case study using maize in Florida and compare trained policies with a maize management guideline in simulations. Our trained policies under both full and partial observations achieve better outcomes, resulting in a higher profit or a similar profit with a smaller environmental impact. Moreover, the partial-observation management policies are directly deployable in the real world as they use readily available information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题