在多目标（深）增强学习中学习公平的政策，并获得平均和打折的奖励

论文标题

在多目标（深）增强学习中学习公平的政策，并获得平均和打折的奖励

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

论文作者

Siddique, Umer, Weng, Paul, Zimmer, Matthieu

论文摘要

由于自主系统的操作通常同时影响几个用户，因此他们的设计要考虑公平考虑至关重要。与标准（深度）增强学习（RL）相反，我们研究了学习一个可以公平处理其用户的政策的问题。在本文中，我们提出了这个新颖的RL问题，其中一个目标函数编码了我们正式定义的公平概念，并且已优化。对于这个问题，我们提供了一个理论讨论，在该讨论中，我们研究了折扣奖励和平均奖励的情况。在此分析过程中，我们在标准RL设置中明显得出了一个新的结果，这具有独立的兴趣：它在近似误差上的小说相对于最佳策略的最佳平均奖励构成了最佳奖励奖励。由于用折扣奖励的学习通常更容易，因此该讨论进一步证明，通过学习折扣奖励的公平政策来找到平均奖励的公平政策。因此，我们描述了如何将几种经典的深度RL算法适应我们的公平优化问题，并通过在三个不同领域的广泛实验来验证我们的方法。

As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), we investigate the problem of learning a policy that treats its users equitably. In this paper, we formulate this novel RL problem, in which an objective function, which encodes a notion of fairness that we formally define, is optimized. For this problem, we provide a theoretical discussion where we examine the case of discounted rewards and that of average rewards. During this analysis, we notably derive a new result in the standard RL setting, which is of independent interest: it states a novel bound on the approximation error with respect to the optimal average reward of that of a policy optimal for the discounted reward. Since learning with discounted rewards is generally easier, this discussion further justifies finding a fair policy for the average reward by learning a fair policy for the discounted reward. Thus, we describe how several classic deep RL algorithms can be adapted to our fair optimization problem, and we validate our approach with extensive experiments in three different domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题