论文标题
强化学习自己的方式:通过策略正则化的代理表征
Reinforcement Learning Your Way: Agent Characterization through Policy Regularization
论文作者
论文摘要
最先进的增强学习(RL)算法的复杂性增加导致了抑制解释性和理解力的不透明性。这导致了几种事后解释性方法的发展,这些方法旨在从学识渊博的政策中提取信息,从而有助于解释性。这些方法依赖于对政策的经验观察,因此旨在推广代理人行为的表征。在这项研究中,我们开发了一种通过正规化其目标功能将特征行为浸入特征行为的方法。我们的方法指导代理在学习过程中的行为,从而导致内在特征。它将学习过程与模型说明联系起来。我们为我们方法的生存能力提供了正式的论点和经验证据。在未来的工作中,我们打算采用它来开发代理商,这些代理商根据其支出个性来优化单个财务客户的投资组合。
The increased complexity of state-of-the-art reinforcement learning (RL) algorithms have resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post-hoc explainability methods that aim to extract information from learned policies thus aiding explainability. These methods rely on empirical observations of the policy and thus aim to generalize a characterization of agents' behaviour. In this study, we have instead developed a method to imbue a characteristic behaviour into agents' policies through regularization of their objective functions. Our method guides the agents' behaviour during learning which results in an intrinsic characterization; it connects the learning process with model explanation. We provide a formal argument and empirical evidence for the viability of our method. In future work, we intend to employ it to develop agents that optimize individual financial customers' investment portfolios based on their spending personalities.