IRL使用不确定最大熵的原理进行部分观察

论文标题

IRL使用不确定最大熵的原理进行部分观察

IRL with Partial Observations using the Principle of Uncertain Maximum Entropy

论文作者

Bogert, Kenneth, Gui, Yikang, Doshi, Prashant

论文摘要

最大熵的原理是一种广泛的适用技术，用于计算可能最少的信息的分布，同时被约束以匹配经验估计的特征期望。但是，在许多使用嘈杂传感器计算特征期望的现实世界中，由于对相关模型变量的部分观察，可能会具有挑战性。例如，进行学徒学习的机器人可能会因环境阻塞而忽视其正在学习的代理。我们表明，在将最大熵的原理概括为这些类型的情况时，我们不可避免地将对学习模型的依赖性引入了经验特征期望。我们介绍了不确定的最大熵的原理，并提出了从潜在最大熵原理中推广的基于期望最大化的解决方案。最后，我们通过实验证明了我们技术在最大因果熵逆增强学习域中提供的噪声数据的鲁棒性提高。

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world applications that use noisy sensors computing the feature expectations may be challenging due to partial observation of the relevant model variables. For example, a robot performing apprenticeship learning may lose sight of the agent it is learning from due to environmental occlusion. We show that in generalizing the principle of maximum entropy to these types of scenarios we unavoidably introduce a dependency on the learned model to the empirical feature expectations. We introduce the principle of uncertain maximum entropy and present an expectation-maximization based solution generalized from the principle of latent maximum entropy. Finally, we experimentally demonstrate the improved robustness to noisy data offered by our technique in a maximum causal entropy inverse reinforcement learning domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题