信息理论政策从部分观察中与完全知情的决策者学习

论文标题

信息理论政策从部分观察中与完全知情的决策者学习

Information-Theoretic Policy Learning from Partial Observations with Fully Informed Decision Makers

论文作者

Lefebvre, Tom

论文摘要

在这项工作中，我们从观察问题中制定并处理模仿的扩展。观察结果的模仿是对众所周知的模仿学习问题的概括，在该问题中考虑了唯一的示范。在我们的处理中，我们将模仿的范围从观察结果扩展到仅特征示范，可以说是部分观察。因此，我们的意思是，决策者的完整状态尚不清楚，并且必须根据一组有限的功能进行模仿。我们着手直接从这些功能中提取可执行策略的方法，这些功能在文献中将被称为行为克隆方法。我们的处理结合了概率和信息理论的元素，并与熵正规的马尔可夫决策过程结合了连接。

In this work we formulate and treat an extension of the Imitation from Observations problem. Imitation from Observations is a generalisation of the well-known Imitation Learning problem where state-only demonstrations are considered. In our treatment we extend the scope of Imitation from Observations to feature-only demonstrations which could arguably be described as partial observations. Therewith we mean that the full state of the decision makers is unknown and imitation must take place on the basis of a limited set of features. We set out for methods that extract an executable policy directly from those features which, in the literature, would be referred to as Behavioural Cloning methods. Our treatment combines elements from probability and information theory and draws connections with entropy regularized Markov Decision Processes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题