一种基于逗留的半马尔可夫增强学习方法

论文标题

一种基于逗留的半马尔可夫增强学习方法

A sojourn-based approach to semi-Markov Reinforcement Learning

论文作者

Ascione, Giacomo, Cuomo, Salvatore

论文摘要

在本文中，我们介绍了一种新方法，以基于索期的时间过程来分散时间半马尔可夫决策过程。利用离散时间半马尔可夫过程的不同特征，决策过程是通过其手段构建的。通过这种新方法，允许代理商根据当前状态的流程时间考虑不同的行动。研究了一种基于$ q $ - 学习算法的数值方法，用于有限的地平线增强学习和随机递归关系。最后，我们考虑了两个玩具示例：根据赌徒的谬论，一个奖励取决于寄宿时间；即使奖励函数不取决于周时间，环境是半马尔可夫的另一个。这些用于对先前呈现的$ Q $ - 学习算法以及基于深度强化学习的不同天真方法进行一些数值评估。

In this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decision processes are constructed by their means. With this new approach, the agent is allowed to consider different actions depending also on the sojourn time of the process in the current state. A numerical method based on $Q$-learning algorithms for finite horizon reinforcement learning and stochastic recursive relations is investigated. Finally, we consider two toy examples: one in which the reward depends on the sojourn-time, according to the gambler's fallacy; the other in which the environment is semi-Markov even if the reward function does not depend on the sojourn time. These are used to carry on some numerical evaluations on the previously presented $Q$-learning algorithm and on a different naive method based on deep reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题